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OM nucleic - nucleic search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



Searched : 



February 7, 2002, 11:08:54 ; Search time 3842.15 Seconds 

(without alignments ) 
1764.726 Million cell updates/sec 

US-09-394-745-6603 
411 

1 agcaaaagcatagagatcca aggagaagaggaagggaccg 411 

IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



1472140 seqs, 8248589755 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



2944280 



Database 



GenEmbl : * 



1' 




gb ba : * 




2 




gb htg: 


* 


3 




gb in : * 
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gb om : * 




5 




gb ov:* 




6 




gb_pat : 


★ 
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gb_ph : * 
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gb_jp 1 : * 
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gb_pr : * 




10 
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11 
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: * 
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* 
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* 
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gb_vi : 
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* 


16 


em_f un 


: * 


17 


em hum 


: * 


18 


em_in : 




19 


em om: 


* 


20 


em or: 




21 


em ov: 


* 


22 


em pat 




23 


em ph : 


★ 


24 


em pi : 


* 


25 


em ro : 




26 


em_sts 


: * 


27 


em sy : 


* 



28 


em 


un : * 


29 


em 


vi : * 


?o 
o \j 


em 


11 LLJ U 11 QUI . 


31 


em 


htgo inv:* 


32 


em 


htgo rod:* 


33 


em 


htg hum:* 


34 


em 


htg inv:* 


35 


em 


_htg_rod: * 


36 


em 


htg_other : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 

% 

Result Query 

No. Score Match Length DB ID Description 
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c 
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AC069197 Homo sapi 


c 
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c 
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2 


AP004043 
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c 


7 
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9 
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c 


8 


40.2 


9 


8 
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AF233344 
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c 


9 


40 . 2 


9 


8 
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2 
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10 


39 . 8 


9 
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1 
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c 


11 


39.8 


9 


7 


56494 


2 
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c 
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9 


6 
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1 
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c 


13 
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9 
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c 


14 


39.2 


9 


5 
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2 
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c 


15 


39 


9 


5 
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AE005747 


AE005747 Caulobact 




16 


39 


9 


5 


19791 


1 


SPFKBAD 


Y10438 Streptomyce 


c 


17 


39 


9 


5 
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2 


HSS171M 2 


Continuation (3 of 


c 


18 


39 


9 


5 
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9 


HS21C102 


AL163302 Homo sapi 


c 


19 
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9 


4 
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2 
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20 


38.6 


9 


4 
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2 
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AC037447 Homo sapi 




21 


38.4 


9 


3 
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9 


AP001046 


AP001046 Homo sapi 


c 


22 
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9 


3 
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2 


AC018734 


AC018734 Homo sapi 


c 


23 


38.4 


9 


3 
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2 


AC016018 


AC016018 Mus muscu 




24 


38.4 


9 


3 


340000 


9 
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25 
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3 
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1 
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26 


38.2 


9 


3 
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10 
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9 
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2 
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c 
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2 


AC055848 


AC055848 Homo sapi 




29 


38 


9 


2 


87076 


9 


AC005918 


AC005918 Homo sapi 


c 


30 
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9 


2 
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2 


AC074307 


AC074307 Mus muscu 
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9 


1 


83393 


2 
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Continuation (4 of 




32 
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9 


1 
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1 


AF082100 


AF082100 Streptomy 




33 


37 .2 
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1 


141279 


2 
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AC023824 Homo sapi 


c 


34 


37.2 


9 


1 
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2 
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AC087563 Homo sapi 


c 
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37 


9 


0 
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1 


AE003962 
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c 
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37 


9 


0 
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1 
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AE007069 Mycobacte 




37 


37 


9 


0 
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1 


SC3F9 


AL023862 Streptomy 
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9 


0 
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^7 


Q 


n 


^7 Sft (s 


a 
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c 


40 


37 


9. 


.0 


76336 


2 


AC023249 


AC023249 


Homo sapi 


c 


41 


37 


9, 


,0 


223940 


2 


AC087567 


AC087567 


Mus muscu 




42 


36.8 


9, 


,0 


9963 


1 


AE005878 


AE005878 


Caulobact 




43 


36.8 


9. 


,0 


63449 


2 


AC026976 


AC026976 


Homo sapi 




44 


36.8 


9. 


,0 


82024 


2 


AC023210 


AC023210 


Homo sapi 


c 


45 


36.8 


9. 


,0 


163275 


2 


AC007623 


AC007623 


Homo sapi 



ALIGNMENTS 



RESULT 1 
SAHNRNPH/c 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
FEATURES 

source 



mRNA 

gene 
CDS 



Australian National University, 
P O Box 475, Canberra City 



SAHNRNPH 2217 bp mRNA INV 04-MAR-1991 

S. americana hnRNP mRNA for protein homologous to Al, A2/B1 
proteins of mammalian hnRNP. 
X54670 

X54670.1 GI:10106 

hnRNP Al protein; hnRNP A2/B1 protein; hnRNP protein. 
American grasshopper. 
Schistocerca americana 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 
Pterygota; Neoptera; Orthopteroidea; Orthoptera; Caelifera; 
Acridomorpha; Acridoidea; Acrididae; Schistocerca. 

1 (bases 1 to 2217) 
Ball, E.E. 
Direct Submission 

Submitted (2 9-AUG-1990 ) Ball E.E., 
Molecular Neurobiology Group RSBS, 
A.C.T. 2601, Australia 

2 (bases 1 to 2217) 

Ball, E.E. , Rehm, E.J. and Goodman, C.S. 

Cloning of a grasshopper cDNA coding for a protein homologous to 
the Al, A2/B1 proteins of mammalian hnRNP 
Nucleic Acids Res. 19 (2), 397 (1991) 
91195067 

Location/Qualifiers 
1. .2217 

/organism="Schistocerca americana" 

/db_xr e f = " t axon : 7 0 0 9 " 

/dev_stage="35-50% embryo" 

/clone_lib="gtll" 

<94. .2217 

/gene="hnRNP" 

/not e= "putative" 

94. .2217 

/gene="hnRNP" 

94. .1122 

/gene="hnRNP" 

/note="putative" 

/codon_start=l 

/product="mammalian Al, A2 /Bl hnRNP homologue" 
/protein_id="CAA384 81 . 1 " 
/db_xref="GI: 10107" 
/db_xref="SWISS-PROT:P21522" 

/trans lation="MVNMDRENDHGEPEHVRKLFIGGLDYRTTDESLKQHFEQWGEIV 



DVVVMKDPKTKRSRGFGFITYSRAHMVDDAQNARPHKVDGRVVEPKRAVPRTEIGRPE 
AGATVKKLFVGGIKEEMEENDLRDYFKQYGTVVSAAIVVDKETRKKRGFAFVEFDDYD 
PVDKICLSRNHQIRGKHIDVKKALPKGDAPGGRGGGGRGGVGGGAGGGWGGGRGDWGG 
SAGGGGGGGWGGADPWENGRGGGGDRWGGGGGGMGGGDRWGGGGGMGGGDRYGGGGGR 
SGGWSNDGYNSGPQSDGFGGGYKQSYGGGAVRGSSGYGGSRSAPYSDRGSRGGGGGGY 
GSGGGRRY" 
mat_peptide 94. .1119 

/gene="hnRNP" 
/note="putative" 

/product="mammalian Al, A2 /Bl hnRNP homologue" 
protein_bind 118. .375 

/gene="hnRNP" 

/note="containing RNP CS#1" 

/bound_moiety= " RNA" 
misc_f eature 265. .288 

/gene="hnRNP" 

/note="RNP CS#1" 
protein_bind 391. .648 

/gene="hnRNP" 

/note="containing RNP CS#2" 

/bound_moiety= "RNA" 
misc_feature 538. .561 

/gene="hnRNP" 

/note="RNP CS#2" 
BASE COUNT 592 a 354 c 629 g 642 t 

ORIGIN 



Query Match 10.9%; Score 45; DB 3; Length 2217; 

Best Local Similarity 51.8%; Pred. No. 0.16; 

Matches 102; Conservative 0; Mismatches 95; Indels 0; 



Gaps 



0; 



Qy 73 cctagcgataccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaacc 132 

II I I 1 11 I I I I I I I I I I I I II I I I I I I I I III III 
Db 891 CCCTCCCATTCCACCACCCCCACCCCACCTATCTCCACCACCCATACCGCCACCACCACC 832 

Qy 133 accacaacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgt'cgcccgcggc 192 

I I I I I I I I III II I I MM Ml III I II I II 

Db 831 TCCCCATCGGTCACCGCCACCACCACGGCCATTCTCCCATGGATCAGCACCTCCCCAGCC 772 

Qy 193 atggtagagccccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgc 252 

I I I I II I M I I M I I I II I I I I I II I 

Db 7 71 ACCACCTCCACCGCCACCAGCACTTCCACCCCAGTCACCACGTCCTCCGCCCCAGCCACC 712 

Qy 253 gccgccgacctctccaa 269 

II I I I I II I I 

Db 711 ACCTGCGCCACCACCAA 695 



RESULT 2 
AC069197/C 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 



AC069197 27570 bp DNA HTG 17-FEB-2001 

Homo sapiens chromosome 15 clone RP11-505I24 map 15, LOW-PASS 
SEQUENCE SAMPLING. 
AC069197 

AC069197.3 GI:12958076 
HTG; HTGS PHASE0 . 



SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 



human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 27570) 

Birren,B., Linton, L., Nusbaum,C. and Lander, E. 
Homo sapiens chromosome 15, clone RP11-505I24 
Unpublished 

2 (bases 1 to 27570) 

Birren,B., Linton, L., Nusbaum,C, Lander, E., Abraham, H., Allen, N., 
Anderson, S., Baldwin, J., Barna,N., Bastien,V., Beda,F., 
Boguslavkiy, L . , Boukhgalter , B . , Brown, A., Burkett,G., 
Campopiano, A. , Castle, A., Choepel,Y., Colangelo, M . , Collins, S., 
Collymore, A. , Cooke, P., DeArellano, K . , Dewar,K., Diaz, J. S., 
Dodge, S., Domino, M., Doyle, M., Ferreira,P., FitzHugh,W., Gage,D., 
Galagan,J., Gardyna,S., Ginde,S., Goyette,M., Graham, L., 
Grand-Pierre, N. , Grant, G., Hagos,B., Heaford,A,, Horton,L., 
Howland, J.C. , Iliev,I., Johnson, R., Jones, C, Kann,L., Karatas,A., 
Klein, J., LaRocque,K., Lamazares , R . , Landers, T., Lehoczky,J., 
Levine,R., Lieu,C, Liu,G., Locke, K. , Macdonald, P . , Marquis, N., 
McCarthy, M., McEwan,P., McGurk,A., McKernan,K., McPheeters , R. , 
Meldrim,J., Meneus,L., Mihova,T., Miranda, C, Mlenga,V., Morrow, J. 
Murphy, T . , Naylor,J., Norman, C.H., ■ 0' Connor, T . , 0 1 Donnell , P . , 
0'Neil,D., 01ivar,T.M., Oliver, J., Peterson, K., Pierre, N., 
Pisani,C, Pollara,V., Raymond, C, Riley, R. , Rogov,P., Rothman,D., 
Roy, A., Santos, R., Schauer,S., Severy,P., Spencer, B., 
Stange-Thomann, N . , Sto j anovic, N . , Subramanian, A. , Talamas,J., 
Tesfaye,S., Theodore, J., Tirrell,A., Travers,M., Trigilio,J., 
Vassiliev, H. , Viel,R., Vo,A., Wilson, B., Wu,X., Wyman,D., Ye,W.J., 
Young, G., Zainoun,J., Zimmer,A. and Zody,M. 
Direct Submission 

Submitted .( 2 l-MAY-2000 ) Whitehead Institute/MIT Center for Genome 

Research, 320 Charles Street, Cambridge, MA 02141, USA 

On Feb 17, 2001 this sequence version replaced gi:11120926. 

All repeats were identified using RepeatMasker : 

Smit, A.F.A. & Green, P. (1996-1997) 

http : //ftp. genome . Washington . edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact: sequence_submissions@genome.wi.mit.edu 

Project' Information 

Center project name: L9315 
Center clone name: 505 I 24 



NOTE: This record contains 
sequencing reads that have 
contigs. Runs "of N are use 
and the order in which the 
arbitrary. Low-pass sequen 
identifying clones that ma 
overlap relationships amon 
However, it should not be 
will be sequenced to compl 
the record is updated, the 
be preserved. 



34 individual 

not been assembled into 
d to separate the reads 
y appear is completely 
ce sampling is useful for 
y be gene-rich and allows 
g clones to be deduced, 
assumed that this clone 
etion. In the event that 

accession number will 



1 719: contig 

720 819: gap of 
820 1529: contig 

1530 1629: gap of 
1630 2340: contig 

2341 2440: gap of 
2441 3132: contig 

3133 3232: gap of 
3233 3971: contig 

3972 4071: gap of 
4072 4818: contig 

4819 4918: gap of 
4919 5620: contig 

5621 5720: gap of 
5721 6419: contig 

6420 6519: gap of 
6520 7235: contig 

7236 7335: gap of 
7336 8075: contig 

8076 8175: gap of 
8176 8878: contig 

8879 8978: gap of 
8979 9704: contig 

9705 9804: gap of 
9805 10508: contig 
10509 10608: gap of 
10609 11332: contig 
11333 11432: gap of 
11433 12149: contig 
12150 12249: gap of 
12250 12963: contig 
12964 13063: gap of 
13064 13791: contig 
13792 13891: gap of 
13892 14564: contig 
14565 14664: gap of 
14665 15357: contig 
15358 15457: gap of 
15458 16191: contig 
16192 16291: gap of 
16292 16999: contig 
17000 17099: gap of' 
17100 17822: contig 
17823 17922: gap of 
17923 18639: contig 
18640 18739: gap of 
18740 19458: contig 
19459 19558: gap of 
19559 20275: contig 
20276 20375: gap of 
20376 21104: contig 
21105 21204: gap of 
21205 21938: contig 
21939 22038: gap of 
22039 22756: contig 
22757 22856: gap of 
22857 23573: contig 



or fly Dp 


in 


lengtn 


100 Dp 
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in 


length 
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1U0 Dp 
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1UU Dp 
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100 Dp 






j-k -P "7 0 0 K« 
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length 


T A A W« 
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or / 4 / Dp 


in 
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in 


length 


1 A A 

100 Dp 






of 699 bp 


in 


length 


100 bp 






of 716 bp 


in 


length 


100 bp 






of 7 40 bp 


in 


length 


100 bp 






of 703 bp 


in 


length 


100 bp 






of 72 6 bp 


in 


length 


100 bp 






of 704 bp 


in 


length 


100 bp 






of 72 4 bp 


in 


length 


T A A U. « 

100 bp 






or / 1 / bp 


in 


length 


100 bp 






of /14 bp 


in 


length 


1 A A W— . 

100 bp 






of 728 bp 


in 


length 


1 A A W» 

100 Dp 






of 673 bp 


in 


length 


1 A A 1A. « 

100 bp 






of 693 bp 


in 


length 


100 bp 






of 734 bp 


in 


lengtn 


100 bp 






of 708 bp 


in 


length 


100 bp 






of 723 bp 


in 


length 


100 bp 






of 717 bp 


in 


length 


100 bp 






of 719 bp 


in 


length 


100 bp 






of 717 bp 


in 


length 


100 bp 






of 1 13 bp 


in 


length 


100 bp 






of 734 bp 


in 


length 


100 bp 






of 718 bp 


in 


length 


100 bp 






of 717 bp 


in 


length 
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■k 


25186 


25285: gap of 


100 bp 






★ 


25286 


25970: contig 


of 685 bp 


in 


length 




25971 


26070: gap of 


100 bp 






* 


26071 


26764: contig 


of 694 bp 


in 


length 




26765 


26864: gap of 


100 bp 






* 


26865 


27570: contig 


of 706 bp 


in 


length 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Location/Qualifiers 
1. .27570 

/organ ism=" Homo sapiens" 
/db_xref="taxon: 9606" 
/ chromosome=" 15" 
/map="15" 

/clone="RPl 1-505124" 
/clone_lib="RPCI-ll Human Male BAC" 
7225 a 4533 c 5245 g 7049 t 3518 others 



Query Match 10.3%; Score 42.4; DB 2; 

Best Local Similarity 50.5%; Pred. No. 0.72; 
Matches 97; Conservative 0; Mismatches 95; 



Length 27570; 
Indels 0; 



Qy 



Gaps 



135 



Qy 76 agcgataccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccacc 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2594 0 ACCAACACACAAACACACACAACACAACACAAAACCAACCAACCACCAAACAAAACCCAC 25881 

Qy 136 acaacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatg 195 

I I I I III I I I I . I I II I I I I I I I I 

Db 25880 AAACCACCACCCACCCCACCCCCCCCCNCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 25821 



196 



255 



gtagagccccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgcc 
I I I I I I I I I I I I III III I I I I III I I I I 
Db 25820 CCCCCCCCCCCCCCCCCCCCCCCCNCCCCCCCCCCCCCCCCCCCCNCCNCCCCCCCCCCC 25761 



Qy 256 gccgacctctcc 267 

11.11 I II 
Db 25760 CCCCCCCCCCCC 25749 



RESULT 3 
AC023218/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AC023218 76734 bp DNA HTG 13-JUL-2000 

Homo sapiens chromosome 14 clone RP11-105M4 map 14, LOW-PASS 
SEQUENCE SAMPLING. 
AC023218 

AC023218.2 GI:9165238 
HTG; HTGS_PHASE0. 
human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 76734) 

Birren,B., Linton, L., NusbaurrwC. and Lander,E. 



Homo sapiens chromosome 14, clone RP11-105M4 

Unpublished 

2 (bases 1 to 76734) 

Birren,B., Linton, L. , Nusbaum,C, Lander, E . , Abraham,H., Allen, N . 
Anderson, S., Baldwin,J., Barna,N., Beckerly,R., Beda,F., 
Boguslavkiy, L. , Boukhgalter , B . , Brown, A., Burkett,G., Castle, A., 
Choepel,Y., Colangelo, M . , Collins, S., Collymore, A. , Cooke, P., 
DeArellano, K. , Dewar,K., Domino, M . , Doyle, M., Fenestor,J., 
Ferreira,P., FitzHugh,W., Forrest, C, Gage,D., Galagan,J., 
Gardyna,S., Grant, G., Hagos,B., Heaford,A., Horton,L., 
Howland, J . C . , Johnson, R., Jones, C, Kann,L., Karatas,A., Klein, J. 
LanderS/T., Lehoczky,J., Levine,R., Lieu,C, Liu,G., Locke, K. , 
Macdonald, P . , Marquis, N., McEwan,P., McGurk, A. , McKernan,K., 
McPheeters, R. , Meldrim,J., Meneus,L., Morrow, J., Naylor,J., 
Norman,C.H., O'Connor, T., 0 1 Donnell , P . , 01ivar,T.M., Peterson, K., 
Pierre, N., Pisani,C, Pollara,V., Raymond, C, Riley, R., Rothman, D 
Roy, A., Santos, R. , Severy,P., Spencer, B., Stange-Thomann, N . , 
Stojanovic, N. , Subramanian, A. , Talamas,J., Tesfaye,S., Theodore, J 
Tirrell,A., Vassiliev, H . , Viel,R., Vo,A., Wu,X., Wyman,D., Ye, W.J 
Zimmer,A. and Zody,M. 
Direct Submission 

Submitted (09-FEB-2000) Whitehead Institute/MIT Center for Genome 

Research, 320 Charles Street, Cambridge, MA 02141, USA 

On Jul 13, 2000 this sequence version replaced gi: 6957752. 

All repeats were identified using RepeatMasker : 

Smit, A.F.A. & Green, P. (1996-1997) 

http : //ftp . genome . Washington . edu/RM/RepeatMasker . html 
— Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submissions@genome . wi .mit . edu 

Project Information' 

Center project name: L6632 
Center clone name: 105 M 4 



* NOTE: This record contains 81 individual 

* sequencing reads that have not been assembled into 

* contigs. Runs of N are used to separate the reads 

* and the order in which they appear is completely 

* arbitrary. Low-pass sequence sampling is useful for 

* identifying clones that may be gene-rich and allows 

* overlap" relationships among clones to be deduced. 

* However, it should not be assumed that this clone 

* will be sequenced to completion. In the event that 

* the record is updated, the accession number will 

* be preserved. 
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5628 5727: gap of 
5728 6.551: contig 

6552 . 6651 : gap of 
6652 7527: contig 

7528 7627: gap of 
7628 8459: contig 

8460 8559: gap of 
8560 9444: contig 
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9545 10368: contig 
10369 10468: gap of 
10469 11325: contig 
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29452 30292: contig 
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30393 31253: contig 
31254 31353: gap of 
31354 32178: contig 



or oou op 


in 


I y*\ /-* 4— W 

lengun 


inn Kn 
I UU Dp 






r\F Q 0 A Kn 

or oZ4 Dp 


in 


lengt h 


inn Kn 
1UU Dp 






or o / b op 


in 


length 


i nn v>t-\ 
J. UU Dp 






n-F Q*5 9 Kr> 

OX o Dp 


in 


length 


i nn Krv 
1 UU Dp 






n-F Q Q ^ Kn 
OI OO J Dp 


in 


length 


i nn Kn 
1 UU Dp 






n-F QO/1 Kn 

OI oz4 Dp 


in 


length 


inn Kn 
IUU Dp 






or o D / Dp 


in 


length 


inn Kn 

I UU Dp 






n F Q A Q Kn 

or o *± y Dp 


in 


length 


inn Kn 
1 U U Dp 






n F Q A Q Kn 

or o fi y Dp 


in 


1 ength 


inn Kn 

IUU Dp 






nf P A n Kn 
O I O ft U Dp 


in 


| y\ y-w 4— V\ 

lengtn 


inn Kn 
IUU Dp 






n-F n CX A Kn 

or / y 4 Dp 


in 


length 


inn Kn 
IUU Dp 






or obi Dp 


in 


length 


inn Kn 
IUU Dp 






or o4b Dp 


in 


length 


inn i~ >-v 
XUU Dp 






n F Q A n Kn 

or o4U Dp 


in 


length 


inn Kn 
IUU Dp 






n F Q C. A Kn 

or odh Dp 


in 


length 


inn v-»« 
IUU Dp 






n F QOl Kn 

or j Dp 


in 


length 


inn Kn 
IUU Dp 






/-\ -F Q A Q Kn 

or o4o Dp 


in 


length 


inn v-v>-v 
IUU Dp 






_ -C QAQ Kn 

or yuo Dp 


in 


length 


inn Kn 
1 U U Dp 






r\F Q A "7 Kn 

or 0 4/ Dp 


in 


length 


inn w« 
IUU Dp 






n-F QC^ Kn 

or OOD Dp 


in 


length 


i n n w« 
IUU Dp 






r\F DOC Kn 

or 0£0 Dp 


in 


length 


inn Kn 
IUU Dp 






n F QC;! Kn 

or ojj Dp 


in 


length 


inn Kn 
IUU Dp 






n-F OCQ Kn 

or ooy Dp 


in 


length 


inn Kn 
IUU Dp 






A f Q C C Kn 

or ODD Dp 


in 


length 


i n n w« 
IUU Dp 






_ -C QIO Kn 

OI o 1 Z Dp 


in 


length 


inn w_ 
IUU Dp 






or o4o Dp 


in 


length 


100 bp 






of 841 bp 


in 


length 


100 bp 






of 861 bp 


in 


length 


100 bp 






of 825 bp 


in 


length 



32179 32278: gap of 
32279 33145: contig 

33146 33245: gap of 
33246 34087: contig 

34088 34187: gap. of 
34188 35016: contig 
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37884 37983: gap of 
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39875 40726: contig 
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Query Match 10.3%;. Score 42.2; DB 2; Length 76734; 

Best Local Similarity 46.8%; Pred. No. 0.79; 

Matches 125; Conservative 0; Mismatches 142; Indels 0; Gaps 0; 

Qy 1 agcaaaagcatagagatccatcttctctgctcaatcaattacacaacaagagcattctag 60 

I I I I I I I II I I I I I I I I I I I III I 

Db 21681 ACCCATAACATCCCCACCACANATAACAACCTAATAACCCACTCCCCTCCACCCTCTCAC 21622 

Qy 61 atttgagttcatcctagcgataccaatacacccatcccaacactccaaaccaaccaacac 120 

I I I I II I I I I I III I I I I I I I I I I M I I I I I I I * 
Db 21621 CCCTCAAAACACCCACCCCAAACACACCCATCACACCCAACACTACCAATCTACCAACCA 21562 

Qy 121 ttcaaccaaaccaccacaacaatgccttcagtaacccaggcccgtctcatgtggcgtagc 180 

I III III Ml III II I II II I I I I I I II. 
Db 21561 TCGAAACCCACCCCCAANACCACNCCACAACACCCNCCACACCAACTAATCTACCTAATC 21502 

Qy 181 gtcgcccgcggcatggtagagccccaccccttcgctcgcaatcccatcaccatgacccct 240 

III II III III I I I I I III III I I I I I 

Db 21501 CCTCTCCCCATCACTCACTACCACACACCCACCCTTCTCACCCCCCCCTCCCACACCCCC 214 42 

Qy 241 cacgcctggcgcgccgccgacctctcc 267 

! I I I I I 1 I I I I I I 
Db 214 41 CCCCCCATCCCCCACCCACACACCCCC 21415 
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SCD95A 

LOCUS 

DEFINITION 
ACCESSION 
VERSION 
KEYWORDS 



SCD95A 45313 bp DNA BCT 31-MAY-2000 

Streptomyces coelicolor cosmid D95A. 

AL357432 

AL357432.1 GI:8248766 

acetyltransf erase; ATP/GTP binding protein; carboxylesterase ; 
chaperonin 2; cold shock protein; deacetylase; DNA-binding protein; 
helicase; histidine autokinase; homeostasis protein; integral 
membrane protein; metallopeptidase ; mutT-like protein; 
NADP-dependent alcohol dehydrogenase; oxidoreductase; phosphatase; 
possible trehalose-phosphate synthase; reductase; regulatory 
protein; response regulatory protein; secreted protein; secreted 
solute-binding protein; sugar kinase; threonine synthase; 



SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 

REFERENCE 
AUTHORS 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 



transcriptional regulator. 
Streptomyces coelicolor A3 (2). 
Streptomyces coelicolor A3 (2) 

Bacteria; Firmicutes ; Actinobacteria; Actinobacteridae; 
Actinomycetales; Streptomycineae ; Streptomycetaceae; Streptomyces . 

1 (bases 1 to 45313) 

Redenbach, M. , Kieser,H.M., Denapaite, D . , Eichner,A., Cullum,J., 
Kinashi,H. and Hopwood,D.A. 

A set of ordered cosmids and a detailed genetic and physical map 
for the 8 Mb Streptomyces coelicolor A3 (2) chromosome 
Mol. Microbiol. 21 (1), 77-96 (1996) 
97000351 

2 (bases .1 to 45313) 
Seeger,K.J. and Harris, D. 
Unpublished 

3 (bases 1 to 45313) 

Cerdeno, A.M. , Parkhill,J., Barrell,B.G. and Ra j andream, M . A . 
Direct Submission 

Submitted (31-MAY-2000) Streptomyces coelicolor sequencing project, 
Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge 
CB10 ISA E-mail: barrell@sanger.ac.uk Cosmids supplied by Prof. 
David A. Hopwood, [3] John Innes Centre, Norwich Research Park, 
Colney, Norwich, Norfolk NR4 7UH, UK 
Notes: 

Streptomyces coelicolor sequencing at The Sanger Centre is funded 
by the BBSRC and Beowulf Genomics 

Details of S. coelicolor sequencing at the Sanger Centre are 
available on the World Wide Web. 

{URL; http : //www . Sanger . ac . uk/Proj ects/S_coe li color / ) 
CDS are numbered using the following system eg SC7B7.01c. SC (S. 
coelicolor), 7B7 (cosmid name), .01 (first CDS), c (complementary 
strand) . 

The more significant matches with motifs in the PROSITE database 
are also included but some of these may be fortuitous. 
The length in codons is given for each CDS. 

Usually the highest scoring match found by fasta -o is given for 
CDS which show significant similarity to other CDS in the database. 
The position of possible ribosome binding site sequences are given 
where these have been used to deduce the initiation codon. 
Gene prediction is based on positional base preference in codons 
using a specially developed Hidden Markov Model (Krogh et al., 
Nucleic Acids Research, 22 (22) : 4768-4778 (1994) J and the FramePlot 
program of Bibb et al . , Gene 30:157-66(1984) as implemented at 
http : //www . nih . go . jp/ 

jun/cgi-bin/f rameplot .pi . CAUTION: We may not have predicted the 
correct initiation codon. Where possible we choose an initiation 
codon (atg, gtg, ttg or (att) ) which is preceded by an upstream 
ribosome binding site sequence (optimally 5-13bp before the 
initiation codon) . If this cannot be identified we choose the most 
upstream initiation codon. 

IMPORTANT: This sequence MAY NOT be the entire insert of the 
sequenced clone. It may be shorter because we only sequence 
overlapping sections once, or longer, because we arrange for a 
small overlap between neighbouring submissions. 

Cosmid D95A lies between and overlaps cosmids D86A and D12A on the 
Asel-D genomic restriction fragment. 
Location/Qualif iers 



source 1. .45313 

/organism="Streptomyces coelicolor A3 (2)" 

/strain="A3 (2) " 

/db_xref="taxon: 100226" 

/clone="cosmid D95A" 
gene 1. .79 

/gene="SCD95A.01" 
misc_f eature 1. .105 

/note="nominal overlap with Streptomyces coelicolor cosmid 

StD8 6A" 

CDS <1. .79 

/gene="SCD95A.01" 

/note="SCD95A. 01, unknown (fragment), len: >25 aa" 

/codon__start=2 

/transl_table=ll 

/product="hypothetical protein SCD95A.01" 
/protein_id="CAB93028 .1" 
/db_xref="GI : 8248767" 

/translation="ISPTQLLDLAPVLAAELGVRVLGEE" 
gene complement ( 91 . .591) 

/gene="SCD95A.02c" 
CDS complement (91 . .591) 

/gene="SCD95A.02c" 

/note="SCD95A.02c, unknown, len: 166 aa" 

/codon_start=l 

/transl_table=ll 

/product="hypothetical protein SCD95A.02c" 
/protein_id="CAB93029 . 1 " 
/db_xref-"GI : 8248768" 

/trans la tion="MLDIGYALSTRFPDPPQTDYRRADVHALRHDLFCGDVYLADTKA 
DRELSTAWGWVPVLDFAWALCDIVEQVDRDPAGSRAARPQRAELDFTESTDRLLFERR 
FGWVDIEAEWLPADEPPLSFSHTELRREARDFLHDLLADLTDLHEDLADNPAVWSLQA 
RFPRIP" 

gene complement ( 775 . .1374) 

/gene="SCD95A.03c" 
CDS complement (775. .1374) 

/gene="SCD95A.03c" 

/note="SCD95A. 03c, possible transcriptional regulator, 
len: 199 aa; similar to TR:CAB53122 (EMBL : ALIO 9962 ) 
Streptomyces coelicolor putative transcriptional 
regulatory protein SCJ1.04, 207 aa; fasta scores: opt: 154 
z-score: 202.7 E(): 0.00079; 36.4% identity in 77 aa 
overlap. Contains Pfam match to entry PF00440 tetR, 
Bacterial regulatory proteins, tetR family" 
/codon_start=l 
/ trans l__table=ll 

/product="putative transcriptional regulator" 
/protein_id="CAB93030 . 1" 
/db_xref="GI: 8248769" 

/trans lation="MPDRTPDQPLTSRGAATHRRILDVATREFAEHGIAGARVERIVA 
AARTNKAQLYAYFGSKDGLFDAIFFGSLDRIVNVVPIDADDLADWAVRLYDEYLCRPD 
LIRLATWARLERRPAGHLVDDADRRDDAKLRAVAEAQAAGRVRPGDPFDVLALVIAMS 
MAWSPVSNVYAATADEPDDVHERRRALLRDAVRRATAPD" 
misc_feature complement ( 1174 . .1314) 
/gene-"SCD95A.03c" 

/note="Pfam match to entry PF00440 -tetR, Bacterial 
regulatory proteins, tetR family, score 34.80, E-value 



6.3e-08" 

RBS complement (1375. .1378) 

RBS 1480. .1485 

gene 1490. .2626 

/gene="SCD95A.04" 
CDS 1490. .2626 

/gene="SCD95A.04" 

/note="SCD95A. 04, probable NADP-dependent alcohol 
dehydrogenase, len: 378 aa; similar to SW:ADH_MYCTU 
(EMBL:AL021287) Mycobacterium tuberculosis NADP-dependent 
alcohol dehydrogenase (EC 1.1.1.2) Adh, 346 aa; fasta 
scores: opt: 986 z-score: 1074.5 E(): 0; 45.0% identity in 
349 aa overlap. Contains Pfam match to entry PF00107 
adh_zinc,- Zinc-binding dehydrogenases and match to Prosite 
entry PS00059 Zinc-containing alcohol dehydrogenases 
signature" 
/codon_start=l 
/trans l_table=ll 

/product="putative NADP-dependent alcohol dehydrogenase" 
/protein_id="CAB93031 .1" 
/db_xref ="GI : 8248770" 

/translation="MRTTVGWQATGPTLRRAPLERRDLRPDDLAVRVDYCGVCHTDLH 
AVRAGAADGSGPRPLVPGHEFTGWTETGTAVTRFRPGDPVAVGNIVDSCGTCAMCRI 
GQENFCHSFPTLTYGGTDRRDGSTTLGGYSREYVLGERFAYALPAALDPAAAAPLLCA 
GITVWEPLRALGAGPGTRVAVAGLGGLGHLAVKLAVALGADTSVISRSPDKAEDARRL 
GARELVVSTDPERLAAARERFDIVVDTVSAPHDLGAYLRLVALDGTLSHLGHLGPVTV 
ETLDLLIGRKKLSSAGSGGRKGTAEMLAFCAEHGITADIELLPSARVNEALDRLDRGD 
VRHRFVLDLSDLDHSGLPDLGLSDPSDPDLRGPDRSGPQDPAAG" 
misc_feature 1496. .2515 

/gene="SCD95A.04" 

/note="Pfam match to entry PF00107 adh_zinc, Zinc-binding 
dehydrogenases, score 299.10, E-value 5.5e-86" 
misc_feature 1595. .1612 

/gene="SCD95A.04" 

/note="PS00190 Cytochrome c family heme-binding site 
signature" 
misc_feature 1673. .1717 

/gene="SCD95A.04" 

/note="PS00059 Zinc-containing alcohol dehydrogenases 

signature" 
gene 2636. .3607 

/gene="SCD95A. 05" 
CDS 2636. .3607 

/gene="SCD95A.05" 

/note="SCD95A. 05, possible mutT-like protein, len: 323 aa; 
similar to TR:069888 (EMBL : AL0237 97 ) Streptomyces 
coelicolor hypothetical 19.4 kD protein SC2E1.17 or MutT, 
172 aa; fasta scores: opt: 266 z-score: 322.5 E(): 
1.7e-10; 39.3% identity in 145 aa overlap. Contains Pfam 
match to entry PF00293 mutT, Bacterial mutT protein and 
match to Prosite entry PS00893 mutT domain signature. High 
■ content in alanine and leucine amino acid residues" 
/codon_start=l 
/trans l_table=ll 

/product="putative mutT-like protein" 
/protein_id="CAB93032 .1" 
/db xref="GI: 8248771" 



/translation="MILTPLALTPDHDIPGPVLTELTALYASHRAFHALSGDFPDPED 
IRPEQVATALADELARPGAEVLLARDAGRLVGVAVTLARHPDPSDPDPWIGLLMVDAG 
LTRQGHGSRLVSLVEDRFRAAGRTAVRLAVLDGNTDALSFWTALGYRVLDHRRDLGAE 
RPCTVLRRELASDKPRTPRRAARVAVLDPEGAVFLLRYDNVEVGVHWAMPGGGLEADE 
NPREGALREVREETGWTDLEPGPLLCTWEHDFTHLSVGPVRQYEHIYVAQGPRREPTG 
PHLAAAHAADGILTWRWWSRAELAEAPEPLWPPDLALLLETFGGREG" 
misc_feature 3212. .3337 

/gene="SCD95A.05 n 

/note="Pfam match to entry PF00293 mutT, Bacterial mutT 
protein, score 33.50, E-value 4.1e-08" 
misc_feature 3269. .3328 

/gene="SCD95A.05" 

/note="PS00893 mutT domain signature" 
gene complement (3632 . .5053) 

/gene="SCD95A.06c" 
CDS complement (3632 . .5053) 

/gene="SCD95A.06c" 

Query Match 10.1%; Score 41.4/ DB 1; Length 45313; 

Best Local Similarity 47.8%; Pred. No. 1.3; 

Matches 120; Conservative 0; Mismatches 131; Indels 0; Gaps 0; 

Qy 90 acccatcccaacactccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttc 149 

I I I I II I II I I I I I I I I I I I II I I II 

Db 40284 ACTCATCCGGGACGCCTACCGCAGCCAGGCCGACAGCCTCCGCGCCGCCGCCGCGTCCGG 40343 

Qy 150 agtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccc 2 09 

I M I I I I I I I I I I I I II I I I I I I I I I I I I 

Db 4 0344 CGCCGACCTGGCCGGTCTCGCGCACGCCCTGCGCGCCTGGGCCCTCGACGACCCGCAGCG 40403 

Qy 210 cttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctccaa 2 69 

III III I III III I III I I I I I II I I I I I 

Db 40404 CTACTTCCTCATCTTCGGTACCCCCGTCCCCGGCTACCGGGCGCCCGACGACATCACCGA 40463 

Qy 270 gaaagtcgtgaagacaagcactgtcttcttccccttctatgcaggtatccttggatggcc 329 

II III I I I I I I I I I I I I I I I I I I I I III II 

Db 40464 GATCGCCGCCGAGACCATGGCGGTCATCGTCGACGCCTGTGCCGCACTGCCTCCGTCGGA 40523 

Qy 330 agtcgcagccg 340 

I I I I I I 
Db 40524 CGGCACCGACG 40534 



RESULT 5 

AC023519 

LOCUS 

DEFINITION 

ACCESSION 

VERSION' 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 



AC023519 60609 bp DNA HTG 13-JUL-2000 

Homo sapiens clone RP11-16I22, LOW-PASS SEQUENCE SAMPLING. 
AC023519 

AC023519.2 GI:7144934 
HTG; HTGS_PHASE0. 
human. 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 60609) 

Birren,B., Linton, L., Nusbaum,C. and Lander, E. 
Homo sapiens, clone RP11-16I22 



Unpublished 

2 (bases 1 to 60609) 

Birren,B., Linton, L., Nusbaum, C . , Lander, E., Abraham,}!., Allen, N. 
Anderson, S., Baldwin, J., Barna,N., Beda,F., Boguslavkiy, L . , 
Boukhgalter, B . , Brown, A., Burkett,G., Campopiano, A. , Castle, A., 
Choepel,Y., Colangelo, M . , Collins, S., Collyrnore, A . , Cooke, P., 
DeArellano, K. , Dewar,K., Dodge, S., Domino, M., Doyle, M., 
Fenestor,J., Ferreira,P., FitzHugh,W., Forrest, C, Gage,D., 
Galagan,J., Gardyna,S., Ginde,S., Goyette,M., Graham, L., 
Grand-Pierre, N . , Grant, G., Hagos,B., Heaford,A., Horton,L., 
Howland, J. C . , Iliev,I., Johnson, R., Jones, C, Kann,L., Karatas,A. 
Klein, J., Landers, T., Largocque, K. , Lehoczky,J., Levine,R., 
Lieu,C, Liu,G., Locke, K., Macdonald, P . , Marquis, N., McCarthy, M., 
McEwan,P., McGurk,A., McKernan,K., McPheeters , R. , Meldrim,J., 
Meneus,L., Mihova,T., Miranda, C, Mlenga,V., Morrow, J., Naylor,J. 
Norman, C.H., O'Connor, T., 0 1 Donnell , P . , 0'Neil,D., 01ivar,T.M., 
Peterson, K., Pierre, N., Pisani,C, Pollara,V., Raymond, C, 
Riley, R., Rogov, P., Rothman,D., Roy, A., Santos, R. , Schauer,S., 
Severy, P. , Spencer, B. , Stange-Thomann, N . , Sto j anovic, N . , 
Subramanian, A. , Talamas,J., Tesfaye,S., Theodore, J., Tirrell,A., 
Travers,M., Trigilio,J., Vassiliev, H . , Viel,R., Vo,A., Wilson, B . , 
Wu,X., Wyman,D., Ye,W.J., Young, G., Zainoun,J., Zimmer,A. and 
Zody,M. 

Direct Submission 

Submitted ( 15-FEB-2000 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
On Mar 3, 2000 this sequence version replaced gi: 6978163. 
All repeats were identified using RepeatMasker : 
Smit, A.F.A. & Green, P. (1996-1997) 

http : / / ftp . genome . Washington . edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submiss ions @ genome . wi .mit . edu 

Project Information 

Center project name: L3549 
Center clone name: 16 I 22 



* NOTE: This record contains 69 individual 

* sequencing reads that have not been assembled into 

* contigs. Runs of N are used to separate the reads 

* and the order in which they appear is completely 

* arbitrary. Low-pass sequence sampling is useful for 

* identifying clones that may be gene-rich and allows 

* overlap relationships among clones to be deduced. 

* However, it should not be assumed that this clone 

* will be sequenced to completion. In the event that 

* the record is updated, the accession number will 

* be preserved. 

* 1 779: contig of 779 bp in length 

* 780 879: gap of 100 bp 

* 880 1683: contig of 804 bp in length 

* 1684 1783: gap of 100 bp 

* 1784 2597: contig of 814 bp in length 

* 2598 2697: gap of 100 bp 

* 2698 3494: contig of 797 bp in length 



3495 3594: gap of 
3595 4365: contig 

4366 4465: gap of 
4466 5248: contig 

5249 5348: gap of 
5349 6130: contig 

6131 6230: gap of 
6231 7007: contig 

7008 7107: gap of 
7108 7872: contig 

7873 7972: gap of 
7973 8748: contig 

8749 8848: gap of 
8849 9624: contig 

9625 9724: gap of 
9725 10485: contig 
10486 10585: gap of 
10586 11356: contig 
11357 11456: gap of 
11457 12270: contig 
12271 12370: gap of ' 
12371 13158: contig 
13159 13258: gap of 
13259 14034: contig 
14035 14134: gap of 
14135 14923: contig 
14924 15023: gap of 
15024 15789: contig 
15790 15889: gap of 
15890 16659: contig 
16660 16759: gap of 
16760 17531: contig 
17532 17631: gap of 
17632 18415: contig 
18416 18515: gap of 
18516 19322: contig 
19323 19422: gap of 
19423 20177: contig 
20178 20277: gap of 
20278 21084: contig 
21085 21184: gap of 
21185 21938: contig 
21939 22038: gap of 
22039 22759: contig 
22760 22859: gap of 
22860 23617: contig 
23618 23717: gap of 
23718 24487: contig 
24488 24587: gap of 
24588 25407: contig 
25408 25507: gap of 
25508 26337: contig 
26338 26437: gap of 
26438 27252: contig 
27253 27352: gap of 
27353 28104: contig 
28105 28204: gap of 
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28205 28987: contig 
28988 29087: gap of 
29088 29793: contig 
29794 29893: gap of 
29894 30682: contig 
30683 30782: gap of 
30783 31589: contig 
31590 31689: gap of 
31690 32450: contig 
32451 32550: gap of 
32551 33304: contig 
33305 33404: gap of 
33405 34214: contig 
34215 34314: gap of 
34315 35074: contig 
35075 35174: gap of 
35175 35907: contig 
35908 36007: gap of 
36008 36794: contig 
36795 36894: gap of 
36895 37673: contig 
37674 37773: gap of 
37774 38563: contig 
38564 38663: gap of 
38664 39472: contig 
39473 39572: gap of 
39573 40366: contig 
40367 40466: gap of 
40467 41250: contig 
41251 41350: gap of 
41351 42103: contig 
42104 42203: gap of 
42204 42970: contig 
42971 43070: gap of 
43071 43841: contig 
43842 43941: gap of 
43942 44742: contig 
44743 44842: gap of 
44843 45620: contig 
45621 45720: gap of 
45721 46513: contig 
46514 46613: gap of 
46614 47373: contig 
47374 47473: gap of 
47474 48250: contig 
48251 48350: gap of 
48351 49143: contig 
49144 49243: gap of 
49244 49999: contig 
50000 50099: gap of 
50100 50886: contig 
50887 50986: gap of 
50987 51802: contig 
51803 51902: gap of 
51903 52707: contig 
52708 52807: gap of 
52808 53625: contig 
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* 53626 53725: gap of 100 bp 

* 53726 54485: contig of 760 bp in length 

* 54486 54585: gap of 100 bp 

* 54586 55372: contig of 787 bp in length 

* 55373 55472: gap of 100 bp 

* 55473 56229: contig of 757 bp in length 

* 56230 56329: gap of 100 bp 

* 56330 57101: contig of 772 bp in length 

* 57102 57201: gap of 100 bp 

* 57202 57982: contig of 781 bp in length 

* 57983 '58082: gap of 100 bp 

* 58083 58857: contig of 775 bp in length 

* 58858 58957: gap of 100 bp 

* 58958 .59739: contig of 782 bp in length 

* 59740 59839: gap of 100 bp 

* 59840 60609: contig of 770 bp in length. 
FEATURES Location/Qualifiers 

Query Match 10.0%; Score 41.2; DB 2; Length 60609; 

Best Local Similarity 50.3%; Pred. No. 1.5; 

Matches 94; Conservative 0; Mismatches 93; Indels 0; Gaps 

Qy 82 accaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccaccacaaca 141 

I I I I I I I I I I I I I I I I II I I I I. I II I I I I I I II 
Db 49656 ACCCACACAACCCACCCCACCCCACCCCCCCCCCACCCCCCCACCCCACCACCACCCCCA 49715 

Qy 142 atgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagag 201 

I I I I I I I I I I I I I ,1 I I I I I I I I 

Db 4 9716 ACCCCCCCTCCCACACACNCCCCCCCCCCCCACCACCCCCNCAACCCCNACACCCCACCA 49775 

Qy 202 ccccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgac 2 61 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 

Db 4 977 6 CCCCACCCCCCCACCTANCNCCCCACCACCCTCACCCCCAAACCACCCCACACCCCCCCC 4 9835 

Qy 262 ctctcca 268 

I I I I I 
Db 49836 CCCACCA 49842 



RESULT 6 
AP004043/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 



AP004043 102242 bp DNA HTG 17-AUG-2001 

Oryza sativa chromosome 2 clone OJ1124_D06, *** SEQUENCING IN 
PROGRESS ***, in ordered pieces. 
AP004043 

AP004043.1 GI:15208411 
HTG; HTGS_PHASE2. 

Oryza sativa (cultivar : Nipponbare ) DNA, clone : OJ1124_D06 . 
Oryza sativa " 

Eukaryota; Viridiplantae ; Streptophyta; Embryophyta; Tracheophyta 
Spermatophyta; Magnoliophyta ; Liliopsida; Poales; Poaceae; 
Ehrhartoideae; Oryzeae; Oryza. 
1 (bases 1 to 102242) 

Sasaki, T., Matsumoto,T. and Yamamoto,K. 

Oryza sativa nipponbare (GA3) genomic DNA, chromosome 2, BAC 
clone:OJ1124_D06 

Published Only in Database (2001) In press 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



and Yamamoto,K. 



2 (bases 1 to 102242) 
Sasaki, T. , Matsumoto, T . 
Direct Submission 

Submitted (15-AUG-2001 ) Takuji Sasaki, National Institute of 
Agrobiological Resources, Rice Genome Research Program; Kannondai 
2-1-2, Tsukuba, Ibaraki 305-8602, Japan 

(E-mail :tsasaki@nias. af f rc . go . jp, URL : http : //rgp . dna . af f rc . go . jp/, 
Tel: 81-2 98-38-7441, Fax:81-298-38-7 4 68) 

The nucleotide sequence of this BAC clone was generated by 
combining Monsanto and RGP- Japan sequencing data. 

NOTE: It currently consists of 1 contigs. Gaps between the contigs 
are represented as runs of N. The order of the pieces is believed 
to be correct as given, however the sizes of the gaps between them 
are based on estimates that have provided by the submitter. This 
sequence will be replaced by the finished sequence as soon as it i 
available and the accession number will be preserved. 
NOTE: This is a ! working draft 1 sequence. 
This sequence will be replaced 

by the finished sequence as soon as it is available and 
the accession number will be preserved. 
Location/Qualifiers 
1. .102242 

/organism="Oryza sativa" 
/cult ivar="Nipponbare" 
/db_xref="taxon: 4530" 
/ chromosome="2 " 
/clone="OJ1124_D06" 
29287 a 22590 c 22673 g 27691 t 1 others 



Query Match 9.9%; Score 40.8; DB 2; Length 102242; 

Best Local Similarity 49.5%; Pred. No. 1.8; 

Matches 105; Conservative 0; Mismatches 107; Indels 0; Gaps 0 

Qy 89 cacccatcccaacactccaaaccaaccaacacttcaaccaaaccaccacaacaatgcctt 14 8 

I I I I I I I III I I I I I I I I I I I I I I I I II II 
Db 55012 CTCTCCTCTCCGCCTACCACGACAACCGCCTCTACGACCGCGCCATCCAAGCCTTCCGCA 54 953 

Qy 14 9 cagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccacc 208 

I I II I I III II III II II III 

Db 54952 CTCTCCCCGCCGAGCTCGGCATCAAGCCCAGCGTCGTCTCTCACAACGTCCTTCTCAAGT 54893 



Qy 209 ccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctcca 268 

I I I I I I III I I I I I I I I I I I I II I I I I I I 

Db * 54 8 92 CCTTTGTTGCCAGTGGCGACCTCGCCTCCGCCCGCGCCCTGTTCGATGAAATGCCTTCCA 54 833 



Qy 269 agaaagtcgtgaagacaagcactgtcttcttc 300 

II I I I I I I I I I I I I I I I I I I 
Db 54832 AGGCTGACGTCGAGCCAGACATTGTCTCCTGC 54801 



RESULT 7 
AP003865/C 

LOCUS AP003865 125118 bp DNA HTG 10-JUL-2001 

DEFINITION Oryza sativa chromosome 8 clone OJ1081_B12, *** SEQUENCING IN 
PROGRESS ***, in ordered pieces. 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



AP003865 

AP003865.1 GI:14646798 
HTG; HTGS_PHASE2. 

Oryza sativa (cultivar : Nipponbare ) DNA, clone : OJ1081_B12 . 
Oryza sativa 

Eukaryota; Viridiplantae; Streptophyta ; Embryophyta; Tracheophyta; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; 
Ehrhartoideae; Oryzeae; Oryza. 

1 {bases 1 to 125118) 

Sasaki, T., Matsumoto, T . and Yamamoto,K. 

Oryza sativa nipponbare (GA3) genomic DNA, chromosome 8, BAC 
clone:OJ1081_B12 

Published Only in Database (2001) In press 

2 (bases 1 to 125118) 

Sasaki, T., Matsumoto, T . and Yamamoto,K. 
Direct Submission 

Submitted ( 09- JUL-2001 ) Takuji Sasaki, National Institute of 
Agrobiological Resources, Rice Genome Research Program; Kannondai 
2-1-2, Tsukuba, Ibaraki 305-8602, Japan 

(E-mail : tsasaki@abr . af f rc . go . jp, URL : http: //rgp . dna . af f rc . go . jp/, 
Tel: 81-298-38-7441, Fax:81-2 98-38-7 4 68) 

The nucleotide sequence of this BAC clone was generated by 
combining Monsanto and RGP-Japan sequencing data.. 

NOTE: It currently consists of 1 contigs. Gaps between the contigs 
are represented as runs of N. The order of the pieces is believed 
to be correct as given, however the sizes of the gaps between them 
are based on estimates that have provided by the submitter. This 
sequence will be replaced by the finished sequence as soon as it is 
available and the accession number will be preserved. 

* NOTE: This is a 'working draft 1 sequence. 

* This sequence will be replaced 

* by the finished sequence as soon as it is available and 

* the accession number will be preserved. 

Location/Qualifiers 
1. .125118 

/organism-"Oryza sativa" 
/cult ivar=" Nipponbare 11 
/db_xref="taxon: 4530" 
/chromosome=" 8 " 
/clone="OJ1081_B12" 
34332 a 28142 c 28145 g 34297 t 202 others 



Query Match 9.9%; Score 40.8; DB 2; Length 125118; 

Best Local' Similarity 46.8%; Pred. No. 1.8; 

Matches 87; Conservative 0; Mismatches 99; Indels 0; Gaps 0; 

Qy 82 accaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccaccacaaca 141 

II I I I I I I I I I I I I I I I I I I I I I M I I I I I I I III II 
Db 37 855 ACANAAAGAACAAAACCAACACAACAAACCAACCACANCAACACCCACAAACACACCCCA 37 7 96 

Qy 142 atgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagag 201 

III III IN I I I I III I I 

Db 37795 CAACCCCCCCCCCCCCCNCCCCNNCNNCCCCCCCCCCCCCNCCCCCCCCCCCCNNCNCCC 37736 



Qy 202 ccccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgac 261 



Db 37735 CCCCNCCCCCCCCCNCCCCCCNNNCCCCCCCCCCCCCNNCNCCCCCCCNNCNCCCCCCCC 37 67 6 

Qy 262 ctctcc 267 
I I I I 

Db 37675 CCCCCC 37670 



RESULT 8 
AF233344/c 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



promot 
gene 



BASE COUNT 
ORIGIN 



AF233344 3338 bp DNA .PRI 15-MAR-2000 

Homo sapiens fibroblast growth factor receptor 2 (FGFR2) gene, 
promoter sequence. 
AF233344 

AF233344.1 GI:7243697 
human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 3338) 

Ricol,D., Cappellen, D. , El Marjou,A., Gil-Diez-de-Medina , S . , 
Girault,J., Yoshida,T., Ferry, G., Tucker, G., Poupon,M., Chopin, D., 
Thiery,J. and Radvanyi,F. 

Tumour suppressive properties of fibroblast growth factor receptor 
2-IIIb in human bladder cancer 
Oncogene 18 (51), 7234-7243 (1999) 
20071102 

2 (bases 1 to 3338) 

Girault,J., Radvanyi,F. and Ricol,D. 
Direct Submission 

Submitted ( 0 9-FEB-2000 ) UMR 144, CNRS-Institut Curie, 26 rue d f Ulm, 
Paris 75248, France 

Location/Qualifiers 
1. .3338 
/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/chromosome="10 lf 
/map="10q26" 
/clone="PAC6539" 
er 1. .3338 

/gene="FGFR2" 
1. .>3338 
/gene="FGFR2" 

/note="f ibroblast growth factor receptor 2" 
832 a 733 c 854 g 919 t 



Query Match 9.8%; Score 40.2; DB 9; Length 3338; 

Best Local Similarity 55.3%; Pred. No. 2.9; 

Matches 78; Conservative 0; Mismatches- 63; Indels 0; Gaps 0; 

Qy 127 caaaccaccacaacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcc 18 6 

I I I I I I I I I I I II I I I ' I I I I I I I II I I I I I I I 

Db 3207 CCAGCCCGGAGAGCAGTCGCCGCGCCGGGCCAGGTACGCCGCATGCAGCCCCGCGGCGCC 314 8 



Qy 187 cgcggcatggtagagccccaccccttcgctcgcaatcccatcaccatgacccctcacgcc 24 6 

III II I I I I I I I I I I I I I II I I I I I I I I I I I 

Db 3147 CGAGCTTTGTGGCGGCCGCGCGCGCTCCCTCGCCCGCTCCGCACCCGCCGCCCGCCCGCT 3088 

Qy 247 tggcgcgccgccgacctctcc 267 

M I I I I I I I II I I I I 
Db 3087 CGGCTCTCCACCGCGCTCTCC 3067 



RESULT 9 
AC009988/c 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



AC009988 179004 bp DNA HTG 05-MAY-2001 

Homo sapiens chromosome 10 clone RP11-62L18, WORKING DRAFT 
SEQUENCE, 9 unordered pieces. 
AC009988 

AC009988.10 GI:13957605 

HTG; HTGS_PHASE1; HTGS_DRAFT; HTGS_FULLTOP; HTGS_ACTIVEFIN . 
human. 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi / 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 179004) 
Smith, D.R. 

Genome Therapeutics Corporation Sequencing Center: Human Genome 

Sequence Data 

Unpublished 

2 (bases 1 to 179004) 
Smith, D.R. 

Direct Submission 

Submitted ( 09-SEP-1999) Genome Therapeutics Corporation, 100 Beaver 
Street, Waltham, MA 02453, USA 

On May 5, 2001 this sequence version replaced gi: 13605974. 
Genome Center 

Center:. Genome Therapeutics Corporation 

Center code: GTC 

Web site: http://www.genomecorp.com/ 

Contact : gtc-seqcenter@ genome corp . com 
Project Information 

Center project name: hg002 
Summary Statistics 

Sequencing vector: N/A 

Chemistry: Dye-terminator Big Dye; 100% of reads 
Assembly program: Phrap; version 990315 
Consensus quality: 173303 bases at least Q40 
Consensus quality: 174498 bases at least Q30 
Consensus quality: 175282 bases at least Q20 
Insert size: 178303; sum-of-contigs 

Quality coverage: 3.9x in Q20 bases; sum-of-contigs 



NOTE: This is a 'working draft* sequence. It currently 
consists of 9 contigs. The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 



FEATURES 

source 



misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 

misc_f eature 

misc_f eature 
misc_f eature 
misc feature 



BASE COUNT 
ORIGIN 



1 1196: contig of 1196 bp in length 

1197 1296: gap of unknown length 

1297 2376: contig of 1080 bp in length 

2377 2476: gap of unknown length 

2477 3501: contig of 1025 bp in length 

3502 3601: gap of unknown length 

3602 4746: contig of 1145 bp in length 

4747 4846: gap of unknown length 

4847 6274: contig of 1428 bp in length 

6275 6374: gap of unknown length 

6375 15038: contig of 8664 bp in length 
15039 15138: gap of unknown length 
15139 42629: contig of 27491 bp in length 
42630 42729: gap of unknown length 
42730 98264: contig of 55535 bp in length 
98265 98364: gap. of unknown length 
98365 179004: contig of 80640 bp in length. 
Location/Qualifiers 
1. .179004 

/organism="Homo sapiens" 
/db_xref-"taxon: 9606" 
/chromosome="10" 
/clone="RPll-62L18" 
/clone_lib="RPCI-ll" 
1. .1196 

/note="assembly__name : Contigl " 
1297. .2376 

/ note="assembly_name : Contig2 " 
2477. .3501 

/note="assembly__name : Contig3 " 
3602. .4746 

/note="assembly_name : Contig4 " 
4847. .6274 

/note="assembly_name : Contig7 
clone_end:T7" 
6375. .15038 

/note="assembly_name : Contig8 
clone_end:SP6" 
15139. .42629 

/note="assembly_name : Contig9" 
42730. .98264 

/note="assembly_name : ContiglO" 
98365. .179004 
/note="assembly_name : Contig 11" 
47825 a 41271 c 40632 g 48463 t 813 others 



Query Match 9.8%; 
Best Local Similarity 55.3%; 
Matches 78; Conservative 



Score 4 0.2; DB 2; 
Pred. No. 2.6; 
0; Mismatches 63; 



Length 179004; 
Indels 0; Gaps 



0; 



Qy 127 caaaccaccacaacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcc 186 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 

Db 4 6786 CCAGCCCGGAGAGCAGTCGCCGCGCCGGGCCAGGTACGCCGCATGCAGCCCCGCGGCGCC 46727 



187 cgcggcatggtagagccccaccccttcgctcgcaatcccatcaccatgacccctcacgcc 24 6 



Db 



III II I I I I I I I I I I I I I II I I I I. Ill I III 

4 672 6 CGAGCTTTGTGGCGGCCGCGCGCGCTCCCTCGCCCGCTCCGCACCCGCCGCCCGCCCGCT 4 6667 



Qy 247 tggcgcgccgccgacctctcc 267 

I I I I II I 1 I I I I I I I 
Db 4 6666 CGGCTCTCCACCGCGCTCTCC 4 664 6 



RESULT 10 

AF275943 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



CDS 



AF275943 11096 bp DNA BCT 02-SEP-2000 

Streptomyces avermitilis avermectin polyketide synthase gene, 
partial cds . 
AF275943 

AF275943.1 GI:9964075 

Streptomyces avermitilis. 
Streptomyces avermitilis 

Bacteria; Firmicutes; Actinobacteria ; Actinobacteridae; 
Actinomycetales ; Streptomycineae ; Streptomycetaceae ; Streptomyces. 

1 (bases 1 to 11096) 
Hong,Y.-S. and Lee, J. J. 

Targeted Gene Disruption of the avermectin O-methyltrans f erase gene 
and polyketide synthase gene from Streptomyces avermitilis 
Unpublished 

2 (bases 1 to 11096) 
Hong,Y.-S. and Lee, J. J. 
Direct Submission 

Submitted ( 07- JUN-2000 ) Anticancer Agent Research Laboratory, Korea 
Research Institute of Bioscience and Biotechnology, P.O. Box 116, 
YuSong-Gu, Taejon 305-600, South Korea 

Location/Qualifiers 

1.' .11096 

/ organ ism= "Streptomyces avermitilis" 

/strain="ATCC31271" 

/db_xref="taxon: 33903" 

/ db_x ref="ATCC:31271" 

220. .>11096 

/note="Avrl" 

/co'don_start=l 

/trans l_table=ll 

/product^" avermectin polyketide synthase" 
/protein_id-"AAG0 98,12 .1" 
/db_xref ="GI : 9964076" 

/translation="MADEADGGVVFVFPGQGPQWPGMGRELLDASDVFRESVRACEAE 

FAPYVDWSVEQVLRDSPDAPGLDRVDVVQRTLFAVMISLAALWRSQGVEPCAVLGHSL 

GEIAAAHVSGGLSLADAARVGDAWSQAQTTLAGTGALVSVAATPDELLPRIAPWTEDN 

PARLAVAAVNGPRSTVVSGAREAVADLVADLTAAQVRTRMIPVDVPAHSPLMYAIEER 

VVSGLLPITPRPSRIPFHSSVTGGRLDTRELDAAYWYRNMSSTVRFEPAARLLLQQGP 

KTFVEMSPHPVLTMGLQELAADLGDTTGTADTVIMGTLRRGQGTLDHFLTSLAQLRGH 

GETSATTVLSARLTALSPTQQQSLLLDLVRAHTMAVLNDDGNERTASDAGPSASFAHL 

GFDSVMGVELRNRLSKATGLRLPVTLILDHTTPAAVAARLRTAALGHLDEDTAPVPDS 

PSGHGGTAAADDPIAIIGMACRFPGGVRSPKDLWELAASGGDAIGPFPTDRGWPTEQR 

HAQDPTQPGTFYPQGGGFLHDAAHFDAGFFGISPREALAMDPQQRLLLETSWEAFERA 

GIDPLSVRGSRTGVFAGALSFDYGPRMDTASSEGAADVEGHILTGTTGSVLSGRSAYS 

FGLEGPAITVDTGCSASLVTLHLACQSLRSGECTFALAGGVSDVHPGMFIEFSRQCGL' 

SVDGRCKAVSAAADGTGWGEGIGLLAVVRGSAVNQDGASNGLTAPNGPAQERVIRQAL 



ANAGLSVADVDVVEGHGTGTTLGDPIEAQALLATYGQRAGDRPLWLGSLKSNIGHTMA 
AAGVGGVIKMVMALREGVLPRTLHVDEPSPQGLAAGAVRLLTEAVPWPGDAAGRLRRA 
GVS S FGIGGTNAHVT LEEAPAAGGCVAGGRVLEGAPGLAI S VAES VAAPVAVS APVAE 
SVPVPVPVPVPVPVSARSEAGLRAQAEALRQYEAVQPDVSLADVGAGLACRQAVLEHH 
VVI LAACTSS SRSAARTTARS S STARPQARPAPTSARRIGALAAGSGS AALTTGHAPG 
GDRGGVVFVFPWQGGQWAGMGVRLLCLLRVFARRMQACEEALAPWVDWSVVDILRRDA 
GDAVWEQADVVQP VLFSVMVS LAALWRS YG I E PNE VLGHSKDE IAAAH I YGALS LKDA 
AKTVALPPQEVEQLIGERGGRLWVAAVNGPRSTAVSGDAEAVVEVLAYCAGTGVRPRI 
PVDYASHCPHVQPLREELLELLGDISPQPYGVPFFSTVEGTWLDTTTLD7UVYWYRNLH 
QPVRFSHDVQALADQGHRVLLEVSPHPTLVPAIEDTTEDTARRRHCDRQPPPRRERHP 
LLPQRLRLDPYYRHRQTHHVAPLLDPPRHLPPPLDAPRPAHLPFQHQHYWLESSQPGA 
GSGSGAGAGSGAGSGRAGTAGGTAEVESRFWDAVARQDLETVATTLAVPPSAGLDTVV 
PALSAWHRHQHDQARINTWTYQETWKPLTLPTTHQPHQTWLIAIPETQTHHPHITNIL 
TNLHHHGITPIPLPLTTHHTNPQH^HHTLHHTRQQAQNHTTGAITGLLSLLALDETPH 
PHHPHTPTGTLLNLTLTQTHTQTHPPTPLWYATTNATTTHPNDPLTHPTQAQTWGLAR 
TTLLEHPTHTAGIIDLPTTPTPHTLHHLTQTLTQPHHQTQLAIRTTGTHTRRLTPTTL 
TPTHQPPTPTPHGTTLITGGTGALATHLTHHLTTHQPTQHLLLTSRTGPHTPHAQHLT 
TQLQQKGIHLTITTCDTSTPRPTHNNSLNTIPPQHPVTTVIHTGGILDDATLTNLTPT 
QLNNVLRAKAHSAHLLHQLTQHTPLTAFVLYSSAAATFGAPGQANYAAANAYLDALAH 
HRHTHHLPATSIAWGTWQGNGLADSDKARAYLDRRGFRPMSPELATAAVTQAIADTER 
PYVVIADIDWSKIEHTSQTSDLVSAAREREPAVQRPTPPAELHKTLAHQTSADQRAAL 
LELVRDHVAAVLRHADPKAIAPDQSFRALGFDSLTAVEFRNLLIKATGLRLPVSLVFD 
HPTPAKLAVHLQNQLRGTAAESAPSA7VAVTAEASVTEPIAIVGMACRFPGGVTSADDF 
WDLISSEQDAIGGFPTDRGWDLDTLYDPDPDHPGTCYTRNGGFLYDAGHFDAEFFGIS 
PREALAMDPQQRLLLETAWET I EHAGINPHTLHGTPTGVFTGTNGQDYALRVHNAGQS 
TDGFALTGTAGSVISGRISYTFGFEGPAVSVDTACSSSLVALHLACQALRAGECSMAL 
AGGVTVMSSPGAFVEFSRQRGLAADGHCKAFSAAADGTGWGEGVGMLLVERLSDAHRN 
GHRVLAVVRGSAVNQDGASNGLTAPNGPSQQRVIRQALANAGLSAGDVDAVEAHGTGT 
TLGDPIEAQALLATYGQDRAGEGPLWLGSVKSNVGHTQAT^AGVAGVIKMVMALRHGLL 
PRTLHVDEPSPHVDWSAGAVQLLTETVPWPGGEGRLRRAGVSSFGVSGTNAHVILEEA 
PADDVPGGPPAGEGDAGSDDEAAAGSPGVWPWLVSAKSQPALRAQAQALHAHLTDHPG 
LDLADVGYTLAHARAVFDHRATLIAADRDTFLQALQALAAGEPHPAVIHSSAPGGTGT 
GEAAGKTAFICSGQGTQRPGMAHGLYHTHPVFAAALNDICTHLDPHLDHPLLPLLTQN 
DNDNEDAAALLQQTRYAQPALFAFQVALHRLLTDGYHITPHYYAGHSLGEITGAHLAG 
ILTLTDATTLITQRATLMQTMPPGTMTTLHTTPHHITHHLTAHENDLAIAAINTPTSL 
VISGTPHTVQHITTLCQQQGIKTKTLPTNHAFHSPHTNPILNQLHQHTQTLTYHPPHT 
PLITANTPPDQLLTPHYWTQQARNTVDYATTTQTLHQHGVTTYIELGPDNTLTTLTHH 
NLPNTPTTTLTLTHPHHHPQTHLLTNLAKTTTTWHPHHYTHHAQPTPHPHPLDLPTYP 
FQHHHYWLESTQPGAGNVSAAGLDPTEHPLLGATLELATDGGALLAGRLSLRSHPWLA 
DHAVGGTVLLSGATFLELALHAGTYVGCDRVDELTLHAPLVVPVDGGVSVQVGVAAAD 
GKGRRLVSVYARGGSACGGGGASGGVWTCHASGVLVEAAAGGVVVDGLAGVWPPRGAV 
AVDVDGVRDRLAGAGCVLGPVFSGLRAVWRDGGDLLAEVCLPEEAWGDAAGFGLHPAL 
LNGVVQPLSVLLPGGTGFGKGAGFGKGVRVPAVWGGVSLHRAGVTGVRVRVSAVGRGG 
GREAVSVVVGDEAGVPVASVDRLELRPVDMGQLRAVSVSAGRRGSLYAVQWAEVGPVP 
VCGQAWAWHEDVGESGGGPVPGVVVLRCPDAGAGGGGGGGGGGGVGEVVGGVLGVVQG 
WLGLERFAGSRLVVVTRGAVVAGPEDGPVDVVGASVWGLVRSAQAEHPDRFVLLDLDT 
DTGTDLDTGAGAGWGVDGGRVAAVVACGEPQLAVRGERLLAARLTRLESSGDVPAQRS 
GDTRARRSDVPAQRSGGVPARRSVDVSGREVLPWLSGGSVLVTGGTGVLGAAVARHLA 
GVCGVRDLLLVSRRGPDAPGAEGLRRSWPRGAEVRIVACDVGERREVVRLAGGCSCRV 
SVFVDFSAVASVTVRLPVASDVRKEAAMAYATVEEFTDYLDPDP" 

BASE COUNT 1756 a 4059 c 3528 g- 1753 t 

ORIGIN 



Query Match 9.7%; Score 39.8; DB 1; 

Best Local Similarity 46.8%; Pred. No. 3.6; 
Matches 125; Conservative 0; Mismatches 142; 



Length 11096; 
Indels 0; 



Gaps 



Qy 77 gcgataccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccacca 136 

I I I I I I I I I I I I II I I II I I I I I I I I II II I I I 
Db 54 41 GCGACACCAGCACACCCAGACCAACTCACAACAACTCACTCAACACCATCCCCCCACAAC 5500 

Qy 137 caacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatgg 196 

I I I I I I I I I I III 1 I I M I I I I I II 

Db 5501 ACCCCGTCACCACCGTCATCCACACCGGAGGCATCCTCGACGACGCCACCCTCACCAACC 5560 

Qy 197 tagagccccaccccttcgctcgcaatcccat caeca tgacccctcacgcctggcgcgccg 256 

I I I I I I II I I I I I I I Ml I II I III 

Db 5561 TCACCCCCACCCAACTCAACAACGTCCTCCGCGCCAAAGCCCACAGCGCCCACCTCCTCC 5620 

Qy 257 ccgacctctccaagaaagtcgtgaagacaagcactgtcttcttccccttctatgcaggta 316 

I I I I I I I I I I II I I I I I I I I I I I I III I 

Db 5621 ACCAACTCACCCAACACACCCCCCTCACCGCCTTCGTCCTCTACTCCTCCGCCGCCGCCA 5680 



Qy 317 tccttggatggccagtcgcagccgcct 343 

I I I I I I I I MM II 
Db 5681 CCTTCGGCGCACCCGGCCAAGCCAACT 5707 



RESULT 11 
AC083782/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



AC083782 56494 bp DNA HTG 30-SEP-2000 

Homo sapiens chromosome 12 clone RP11-84L9 map 12, LOW-PASS 
SEQUENCE SAMPLING. 
AC083782 

AC083782.1 GI:10440689 
HTG; HTGS_PHASE0. 
human . 

Homo sapiens 

Eukaryota; Metazoa; Chorda t a; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 56494) 

Birren,B., Linton, L., Nusbaum,C. and Lander, E. 
Homo sapiens chromosome 12, clone RP11-84L9 
Unpublished 

2 (bases 1 to 56494) 
Birren,B., Linton, L. , Nusbaum,C, 
Anderson, S., Barna,N., Bastien,V. 
Boukhgalter , B . , Brown, A., Burkett,G., Campopiano, A. , Castle, A., 
Choepel,Y., Colangelo, M . , Collins, S., Collymore , A. , Cooke, P., 
DeArellano, K. , Dewar,K., Diaz, J. S., Dodge, S., Ferreira,P., 
FitzHugh,W., Gage,D., Galagan,J., Gardyna,S., Ginde,S., Goyette,M 
Graham, L., Grand-Pierre, N . , Hagos,B., Heaford,A., Horton,L., 
Iliev,I., Johnson, R. , Jones, C, Kann,L., Karatas,A., LaRocque,K., 
Lamazares, R. , Landers, T., Lehoczky,J., Levine,R., Lieu,C, Liu,G. 
Macdonald, P . , Marquis, N., McCarthy, M., McEwan,P., McKernan,K., 
McPheeters, R. , Meldrim,J., Meneus,L., Mihova,T., Mlenga,V., 
Morrow, J., Murphy, T., Naylor,J., Norman, C.H., O 'Connor, T., 

0' Donnell, P. , O f Neil,D., 01ivar,T.M., Oliver, J., Peterson, K., 
Pierre, N., Pisani,C, Pollara,V., Raymond, C, Rieback,M., Riley, R 
Rogov,P., Rothman,D., Roy, A., Santos, R., Schauer,S., Severy,P., 
Sougnez, C . , Spencer, B . , Stange-Thomann, N . , Stoj anovic, N . , 
Strauss , N . , Subramanian, A. , Talamas,J., Tesfaye,S., Theodore, J., 
Tirrell,A., Travers,M., Trigilio,J., Vassiliev, H . , Viel,R., Vo,A. 



Lander, E., Abraham, H., Allen, N. 
r Beda,F., Boguslavkiy, L . , 



Wilson, B., Wu,X., Wyman,D., Ye,W.J., Young, G., Zainoun,J., 
Zimmer, A. and Zody,M. 
TITLE Direct Submission 

JOURNAL Submitted { 30-SEP-2000 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
COMMENT All repeats were identified using RepeatMasker : 

Smit, A.F.A. & Green, P. (1996-1997) 

http : //ftp . genome . Washington . edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submissions@genome . wi .mit . edu 

Project Information 

Center project name: L11227 
Center clone name: 84 L 9 



* NOTE: This record contains 72 individual 

* sequencing reads that have not been assembled into 

* contigs. Runs of N are used to separate the reads 

* and the order in which they appear is completely 

* arbitrary. Low-pass sequence sampling is useful for 

* identifying clones that may be gene-rich and allows 

* overlap relationships among clones to be deduced. 

* However, it should not be assumed that this clone 

* will be sequenced to completion. In the event that 

* the record is updated, the accession number will 

* be preserved. 



* 


1 


634 : 


contig 


of 634 bp 


in 


length 


* 


635 


734: gap of 


100 bp 






* 


735 


1430: 


contig 


of 696 bp 


in 


length 


* 


1431 


1530: gap 


of 


100 bp 








1531 


2223: 


contig 


of 693 bp 


in 


length 




2224 


2323: gap 


of 


100 bp 








2324 


3001: 


contig 


of 678 bp 


in 


length 




3002 


3101: gap 


of 


100 bp 








3102 


3794 : 


contig 


of 693 bp 


in 


length 


* 


3795 


3894: gap 


of 


100 bp 






* 


3895 


4593: 


contig 


of 699 bp 


in 


length 


* 


4594 


4693: gap 


of 


100 bp 






* 


4694 


5386: 


contig 


of 693 bp 


in 


length 


* 


5387 


5486: gap 


of 


100 bp 








5487 


6141: 


contig 


of 655 bp 


in 


length 


* 


6142 


6241: gap 


of 


100 bp 








6242 


6945: 


contig 


of 704 bp 


in 


length 




6946 


7045: gap 


of 


100 bp 






* 


7046 


7653: 


contig 


of 608 bp 


in 


length 




7654 


7753: gap 


of 


100 bp 








7754 


8449: 


contig 


of 696 bp 


in 


length 




8450 


8549: gap 


of 


100 bp 








8550 


9247: 


contig 


of 698 bp 


in 


length 


* 


9248 


9347: gap 


of 


100 bp 






* 


9348 


10038: 


contig 


of 691 bp 


in 


length 


* 


10039 


10138: gap of 


100 bp 






* 


10139 


10836: 


contig 


of 698 bp 


in 


length 


* 


10837 


10936.: gap of 


100 bp 








10937 


11647: 


contig 


of 711 bp 


in 


length 



11648 11747: gap of 
11748 12433: contig 

12434 12533: gap of 
12534 13240: contig 

13241 13340: gap of 
13341 14065: contig 

14066 14165: gap of 
14166 14877: contig 

14878 14977: gap of 
14978 15673: contig 

15674 15773: gap of 
15774 16374: contig 

16375 16474: gap of 
16475 17104: contig 

17105 17204: gap of 
17205 17906: contig 

17907 18006: gap of 
18007 18692: contig 

18693 18792: gap of 
18793 19481: contig 

19482 19581: gap of 
19582 202.96: contig 

20297 20396: gap of 
20397 21077: contig 

21078 21177: gap of 
21178. 21915: contig 

21916 22015: gap of 
22016 22700: contig 

22701 22800: gap of 
22801 23507: contig 

23508 23607: gap of 
23608 24314: contig 

24315 24414: gap of 
24415 25032: contig 

25033 25132: gap of 
25133 25833: contig 

25834 25933: gap of 
25934 26637: contig 

26638 26737: gap of 
26738 27443: contig 

27444 27543: gap of 
27544 28159: contig 

28160 28259: gap of 
28260 28961: contig 

28962 29061: gap of 
29062 29783: contig 

29784 29883: gap of 
29884 30482: contig 

30483 30582: gap of 
30583 31206: contig 

31207 31306: gap of 
31307 31931: contig 

31932 32031: gap of 
32032 32719: contig 

32720 32819: gap of 
32820 33512: contig 

33513 33612: gap of 
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in 


length 
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in 


length 


inn u« 
1UU Dp 
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in 


length 


inn Kn 
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in 


length 


inn Kn 
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in 


length 


inn Kn 
1UU Dp 






n F Cfll Kn 

or out Dp 


in 


length 


inn Kn 
1UU Dp 






~ -p con Kn 
or dju Dp 


in 


length 


inn Kn 
1UU Dp 






n -f n no Kn 
or /uz Dp 


in 


length 


inn Kn 
1UU Dp 






or boo Dp 


in 


length 


inn Kn 
1UU Dp 






n F CQ Q Kn 

or boy Dp 


in 


length 


inn Kn 
1UU Dp 






n -F 71 C Kn 

Ot /lb Dp 


m 


length 


100 Dp 






or oo 1 Dp 


in 


length 


T A A W-n 

100 Dp 






Of /OO Dp 


in 


length 


1 A A W« 

100 Dp 






n F d Q E; Kn 

Ot 000 Dp 


in 


length 


1UU Dp 






* ■£ TAT 

or /0 / Dp 


in 


length 


1 A A — , 

1U0 Dp 






or /U / Dp 


in 


lengtn 


1 A A 

100 Dp 






n F CI Q Kn 

Ot old Dp 


in 


length 


1 A A V»« 

1UU Dp 






_ "7 A 1 K*-v 

or /0l Dp 


m 


length 


1 n n Kn 
1U0 Dp 




1 n v. «+- K 

lengtn 


Ot /04 Dp 


in 


1 n n Kn 
1U0 Dp 






n F H C\C Kn 

Ot /Uo Dp 


in 


length 


inn Wn 
1UU Dp 






Ot olo Dp 


in 


length 


inn v^n 
1UU Dp 






Ot /UZ Dp 


in 


length 


"1 A A 

100 Dp 






Ot ill Dp 


in 


length 


1 A A I*. wi 

100 Dp 






of 599 bp 


in 


length 


100 bp 






or oZ4 bp 


in 


length 


1 A A Wn 

1UU Dp 






of 625 bp 


in 


length 


100 bp 






of 688 bp 


in. 


length 


100 bp 






of 693 bp 


in 


length 


100 bp 







* 33613 34332: contig of 720 bp in length 

* 34333 34432: gap of 100 bp 

* 34433 35123: contig of 691 bp in length 

* 35124 35223: gap of 100 bp 

* 35224 35968: contig of 745 bp in length 

* 35969 36068: gap of 100. bp 

* 36069 36747: contig of 679 bp in length 

* 36748 36847: gap of 100 bp 

* 36848 37521: contig of 674 bp in length 

* 37522 37621: gap of 100 bp 

* 37622 38331: contig of 710 bp in length 

* 38332 38431: gap of 100 bp 

*. 38432 39140: contig of 709 bp in length 

* 39141 39240: gap of 100 bp 

* 39241 39926: contig of 686 bp in length 

* 39927 40026: gap of 100 bp 

* 40027 40718: contig of 692 bp in length 

* 40719 40818: gap of 100 bp 

* 40819 . 41503: contig of 685 bp in length 

* 41504 41603: gap of 100 bp 

* 41604 42310: contig of 707 bp in length 

* 42311 42410: gap of 100 bp 

* 42411 43121: contig of 711 bp in length 

* 43122 43221: gap of 100 bp 

* 43222 43938: contig of 717 bp in length 

* 43939 44038: gap of 100 bp 

* 44039 44715: contig of 677 bp in length 

* 44716 44815: gap of 100 bp 

* 44816 45525: contig of 710 bp in length 

* 45526 45625: gap of 100 bp 

* 45626 46244: contig of 619 bp in length 

* 46245 46344: gap of 100- bp 

* 46345 47044: contig of 700 bp in length 

* 47045 47144: gap of 100 bp 

* 47145 47842: contig of 698 bp in length 

* 47843 47942: gap of 100 bp 

* 47943 48554: contig of 612 bp in length 

* 48555 48654: gap of 100 bp 

* 48655 49340: contig of 686 bp in length 

* 49341 49440: gap of 100 bp 

* 49441 50147: contig of 707 bp in length 

* 50148 50247: gap of 100 bp 

* 50248 50970: contig of 723 bp in length 

* 50971 51070: gap of 100 bp 

* 51071 51781: contig of 711 bp in length 

* 51782 51881: gap of 100 bp 

* 51882 52572: contig of 691 bp in length 

* 52573 52672: gap of 100 bp 

* 52673 53370: contig of 698 bp in length 

* 53371 53470: gap of 100 bp 

* 53471 54081: contig of 611 bp in length 

* 54082 54181: gap of 100 bp 

* 54182 54883: contig of 702 bp in length 



Query Match 9.7%; Score 39.8; DB 2; 

Best Local Similarity 58.6%; Pred. No. 3.4; 
Matches 68; Conservative 0; Mismatches 48; 



Length 56494; 
Indels 0; Gaps 0; 



Qy 61 atttgagttcatcctagcgataccaatacacccatcccaacactccaaaccaaccaacac 120 

I I I I I 1 I I I I I I I I I I I I I I I I II 1 I I I I I I I I I I I 

Db 31161 ATTTCCAACCATCCCACCCATTCCATCCCATCCATTCCATCCATTCCATCCATCCATCCA 31102 

Qy 121 ttcaaccaaaccaccacaacaatgccttcagtaacccaggcccgtctcatgtggcg 17 6 

I II I I I III I I I I II I I I I I I I I I I I I I I III 
Db 31101 TGCATCCATTCCATTCCATCCATGCATGCATCCATCCATACATNTATCATTGAGCG 3104 6 



RESULT 12 

SC6G10/C 

LOCUS 

DEFINITION 
ACCESSION 
VERSION 
KEYWORDS 



SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 

REFERENCE 
AUTHORS 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



SC6G10 36734 bp DNA BCT 24-MAR-1999 

Streptomyces coelicolor cosmid 6G10. 

AL049497 

AL049497.1 GI:4539196 

aminotransferase; coxl; cox2; cox3; cytochrome b; cytochrome c 
oxidase; gene duplication; glycosyl transferase; heme-binding; 
integral membrane protein; long chain fatty acid coA ligase; 
membrane transporter; phosphoribosylanthranilate transferase; qcrA; 
qcrB; qcrC; quinolinate synthetase; Rieske iron-sulfur protein; 
secreted protein; transcriptional regulator; trpD; two component 
sensor kinase/response regulator. 
Streptomyces coelicolor A3 (2). 
Streptomyces coelicolor A3 (2) 

Bacteria; Firmicutes; Actinobacteria ; Act.inobacteridae ; 
Actinomycetales ; Streptomycineae ; Streptomycetaceae ; Streptomyces . 

1 (bases 1 to 36734) 

Redenbach,M. , Kieser,H.M., Denapaite, D . , Eichner,A., Cullum,J., 
Kinashi,H. and Hopwood,D.A. 

A set of ordered cosmids and a detailed genetic and physical map 
for the 8 Mb Streptomyces coelicolor A3 (2) chromosome 
Mol. Microbiol. 21 (1), 77-96 (1996) 
97000351 

2 (bases 1 to 36734) 
Seeger,K. and Harris, D. 
Unpublished 

3 (bases 1 to 36734) 

Bentley, S . D. , Parkhill,J., Barrell,B.G. and Raj andream, M. A. 
Direct Submission 

Submitted ( 23-MAR-l 999 ) Streptomyces coelicolor sequencing project, 
Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge 
CB10 ISA E-mail: barrell@sanger.ac.uk Cosmids supplied by Prof. 
David A. Hopwood, [3] John Innes Centre, Norwich Research Park, 
Colney, Norwich, Norfolk NR4 7UH, UK 
Notes: 

Streptomyces coelicolor sequencing at The Sanger Centre is funded 
by the BBSRC and Beowulf Genomics 

Details of S. coelicolor sequencing at the Sanger Centre are 
available on the World Wide Web. 

(URL; http: //www. sanger.ac.uk/Projects/S_coelicolor/) CDS are 
numbered using the following system eg SC7B7.01c. SC (S. 
coelicolor), 7B7 (cosmid name), .01 (first CDS), c (complementary 
strand) . 

The more significant matches with motifs in the PROSITE database 
are also included but some of these may be fortuitous. The length 
in codons is- given for each CDS. 



Usually the highest scoring match found by fasta -o is given for 
CDS which show significant similarity to other CDS in the database. 
The position of possible ribosome binding site sequences are given 
where these have been used to deduce the initiation codon. Gene 
prediction is based on positional base preference in codons using a 
specially developed Hidden Markov Model (Krogh et al., Nucleic 
Acids Research, 22 (22) : 4768-4778 (1994 ) ) and the FramePlot program 
of Bibb et al., Gene 30:157-66(1984) as implemented at 
http : //www . nih . go . jp/ 

jun/cgi-bin/f rameplot .pi . CAUTION: We may not have predicted the 
correct initiation codon. Where possible we choose an initiation 
codon (atg, gtg, ttg or (att) ) which is preceded by an upstream 
ribosome binding site sequence (optimally 5-13bp before the 
initiation codon) . If this cannot be identified we choose the most 
upstream initiation codon. 

IMPORTANT: This sequence MAY NOT be the entire insert of the 
sequenced clone. It may be shorter because we only sequence 
overlapping sections once, or longer, because we arrange for a 
small overlap between neighbouring submissions. Cosmid 6G10 Lies 
between and overlaps with cosmids 6E10 and 5F7 on the Asel-C 
genomic restriction fragment. . 
FEATURES Location/Qualifiers 
source 1.. .36734 

/organism="Streptomyces coelicolor A3 (2)" 
/strain="A3 (2) " 
/db_xr e f = " t axon : 1 0 0 2 2 6 " 
/clone="cosmid 6G10" 
gene complement ( 1 . .120) 

/gene="SC6G10.01c" 
CDS complement (<1. .120) 

/gene="SC6G10.01c" 

/note="SC6G10.01c, partial CDS, unknown, len: >40aa" 

/codon_start=l 

/trans l_table=ll 

/label=SC6G10.01c 

/product="hypothetical protein" 

/protein_id="CAB39855 .1" 

/db_xref="GI : 4539197" 

/trans lation="MARAGRSGTPHRRAANPPPHRHPGTADAPVPFTPMRTLLI " 
misc_feature 1. .103 

/note="Nominal overlap with cosmid 6E10." 
gene complement ( 232 . .666) 

/gene= n SC6G10.02c" 
CDS complement (232 . .666) 

/gene="SC6G10.02c" 

/note="SC6G10 . 02c, unknown, len: 144aa; similar to 
hypothetical proteins from Mycobacterium eg. TR:053519 
(EMBL : AL021 957 ) hypothetical protein from Mycobacterium 
tuberculosis (144 aa) fasta scores; opt: 479, z-score: 
620.2, E{): 3.1e-27, (49.0% identity in 143 aa overlap)." 
/codon_start=l 
/transl_table=ll 
/label=SC6G10.02c 
/product="hypothetical protein" 
/protein_id="CAB39856.1" 
/db_xref="GI: 4539198" 

/translation="MAEHTSSSITIEAAPADVMAVIADFARYPDWTGEVKEAQVLATD 



EQGRAEQVRLVMDAGAIKDDQTLGYTWTGEHEVSWTLVKSQMLRSLDGSYLLRPAGTG 

TEVTYRLTVDVKI PMLGMIKRKAEKVI I DRALAGLKKRVESK" 

complement (804 . .1601) 

/gene="SC6G10.03c" 

complement (804 . .1601) 

/gene="SC6G10.03c" 

/note="SC6G10 . 03c, unknown, len: 265aa; weak similarity to 

many eg. TR:P95860' (EMBL : Y08256 ) hypothetical protein (309 

aa) fasta scores; opt: 152, z-score: 186.9, E(): 0.0041, 

(24.8% identity in 270 aa overlap)." 

/codon_start=l 

/trans l_table=ll 

/label=SC6G10.03c 

/product=" hypothetical protein" 

/protein_id="CAB39857 .1" 

/db_xref="GI : 4539199" 

/trans lation=" MAPTPPRNTSTRVHWSDVHGNARDLARAGDGADALICLGDLVL 

FLDYADHS RGI FPDLFGVANADRI VALRTARRFEEARE FGRRLWAEAGGE PREL I ERA 

VRKQYAELFAAFPTPTYATYGNVDVPPLWPEYAGPGTTVLDGERVEIGGRVFGFVGGG 

LRTPMNTPYEISDEEYAAKIEAVGEVDVLCTHIPPEVPELVYDTVARRFERGSRALLD 

AIRRTRPRYALFGHVHQPLVRRMRVGATECVNVGHFASSGRPWALEW" 

1912. .3708 

/gene="SC6G10.04" 

1912. .3708 

/gene="SC6G10.04" 

/note="SC6G10 . 04, probable long chain fatty acid coA 
ligase, len: 598aa; similar to many eg. TR:E1359128 
(EMBL : AL0344 43) putative long chain fatty acid coA ligase 
from Streptomyces coelicolor (608 aa) fasta scores; opt: 
1347, z-score: 1506.7, E(): 0, (51.3% identity in 608 aa 
overlap) and SW:LCFB_RAT long chain fatty acid coA ligase 
from Rattus norvegicus (Rat) (699 aa) fasta scores; opt: 
568, z-score: 634.7, E(): 4.7e-28, (30.2% identity in 589 
aa overlap) . Contains Pfam match to entry PF00501 
AMP-binding, AMP-binding enzyme and Prosite match to 
PS00455 Putative AMP-binding domain signature." 
/codon_start=l 
/trans l__table=ll 
/label=SC6G10.04 

/product="putative long chain fatty acid coA ligase" 
/protein_id="CAB39858 .1" 
/db_xref="GI : 4539200" 

/ trans lation="MREFSLPALYEVPADGNLTDIVRRNAAQHPDVAVIARKVGGVWQ 

DVTARAFLAEVHSAAKGLIASGVQPGDRVGLMSRTRYEWTLLDFAIWSAGAITVPVYE 

TS S PEQVQWI LGDSGATACVVESAGHAAAVESVREQLPALKNVWQI DAGAVEELGRLG 

QDVTDRTVEERGSIAKADDPATIVYTSGTTGRPKGCVLTHRSFFAECGNVVERLRPLF 

RTGECSVLLFLPLAHVFGRLVQVAPMIAPIKLGNVPDIKNLTDELAAFRPTLILGVPR 

VFEKVYNSARAKAQADGKGKIFDKAADTAIAYSKALDAPSGPSVGLKIKHKVFDKLVY 

SKLRTVLGGRGEYAISGGAPLGERLGHFFRGIGFTVLEGYGLTESCAATAFNPWDRQK 

IGTVGQPLPGSVVRIADDGEVLLHGEHLFKEYWNNPGATAEALADGWFHTGDIGTLDE 

DGYLRITGRKKEIIVTAGGKNVAPAVMEDRIRAHALVAECMVVGDGRPFVGALVTIDE 

EFLGRWCAEHGKPAGSTAVSLREDPELLAAIQDAVDDGNAAVSKAESVRKFRVLGAQF 

TEDSGHLTPSLKLKRNVVAKDYADEIEAIYSK" 

2050. .3387 

/gene="SC6G10.04" 

/note="Pfam match to entry PF00501 AMP-binding, 
AMP-binding enzyme, score -28.50, E-value 3.2e-13." 



misc_feature 2458. .2493 

/gene="SC6G10.04" 

/note="PS00455 Putative AMP-binding domain signature." 
gene complement { 3696 . .4934) 

/gene="SC6G10.05c" 
CDS complement (3696. .4934) 

/gene="SC6G10.05c" 

/note="SC6G10 . 05c, possible glycosyl transferase, len: 
412aa; similar to TR:053522 (EMBL : AL021957 ) hypothetical 
protein from Mycobacterium tuberculosis (399 aa) fasta 
scores; opt: 1276, z-score: 1452.2, E(): 0, (51.9% 
identity in 391 aa overlap) and SW : WCAL_SALTY putative 
colanic acid biosynthesis glycosyl transferase from the 
rfb (0 antigen) gene cluster of Salmonella typhimurium 
(406 aa) fasta scores; opt: 291, z-score: 334.2, E{): 
2.6e-ll, (29.0% identity in 300 aa overlap). Contains Pfam 
match to entry PF00534 Glycos_transf_l , Glycosyl 
transferases group 1." 
/codon_start=l 
/transl_table=ll 
/label=SC6G10.05c 
• /product="putative glycosyl transferase" 
/protein_id="CAB39859. 1" 
/db_xref="GI: 4539201" 

/translation="MRKTLIVTNDFPPRPGGIQAFLHNMALRLDPERLVVYASTWKRG 
REGIEATAAFDAEQPFTVVRDRTTMLLPTPGATRRAVGLLREHGCTSVWFGAAAPLGL 
MAPALRRAGAERLVATTHGHEAGWAQL PAARQLLRRI GEST DT I T YLGE YTRSRI AGA 
LTPGAAARMVQLPPGVDEKTFHPASGGDEVRARLGFTDRPVVVCVSRLVPRKGQDTLI 
RAMPRILAAEPDAVLLIVGGGPYEKDLRRLAEETGVAAAVHFTGAVPWSELPAHYGAG 
DVFAMPCRTRRGGLDVEGLGIVYLEASATGLPVVAGDSGGAPDAVLDGETGWVVRGED 
PNESADRITTLLADPELRRRMGERGRAWVEEKWRWDLLAEHLRTLLQGGSAARARQAT 
DNVGPPTNRTRHPGRRPYLE" 
misc_feature complement (3849. . 4322) 
/gene="SC6G10.05c" 

/note="Pfam match to entry PF00534 Glycos_transf_l , 

Glycosyl transferases group 1, score 86.70, E-value 

l.le-23." 
gene 5091. .6356 

/gene="SC6G10.06" 
CDS 5091. .6356 

/gene="SC6G10.06" 

Query Match 9.6%; Score 39.6; DB 1; Length 36734; 

Best Local Similarity 48.6%; Pred. No. 3.9; 

Matches 108; Conservative 0; Mismatches' 114; Indels 0; Gaps 0; 

Qy 162 ccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgctcgcaa 221 

II I I I I I I I I I I I I I I I I I I I I I I II I III III 

Db 21826 CCCGTTCATCGAGTCCTGGGTCACCGGCGACAAGCGCGAGCACCACATCCTGGACCGCCC 21767 

Qy 222 tcccatcaccatgacccctcacgcctggcgcgccgccgacctctccaagaaagtcgtgaa 281 

I I I I I I I I I II I I I I I I I I I I I II II I I I I I I 

Db 21766 GCGCAACGCCCCGACCCGTACGGCCTTCGGTGTCGCCTGGCTGACCGTCTACTTCGTGCT 21707 

Qy 282 gacaagcactgtcttcttccccttctatgcaggtatccttggatggccagtcgcagccgc 341 

I I I I I I I I I I I I I II I I I I II. 

Db 21706 GCTGATCGGTGGCGGCAACGACCTGTGGGCCACCCACTTCCACCTGTCGATCAACGCGAT 21647 



Qy 34 2 ctggtggttcaacggaaacatgtgactcttccaaatggaagt 383 

I I I I I I I I I I I II I I I I I II II 
Db 2164 6 CACCTGGTTCGTCCGCATCGCGTTCTTCGTCGGACCGGTCGT 21605 



RESULT 13 
AC023212/C 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 



Craniata; Vertebrata; Euteleostomi; 
Catarrhini; Hominidae; Homo. 



and Lander, E. 



Allen, N. 



AC023212 78220 bp DNA HTG 13-JUL-2000 

Homo sapiens clone RP11-758K16, LOW-PASS SEQUENCE SAMPLING. 
AC023212 

AC023212.2 GI:9164272 
HTG; HTGS_PHASE0. 
human. 

Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 

1 (bases 1 to 78220) 
Birren,B., Linton, L., Nusbaum,C. 
Homo sapiens, clone RP11-758K16 
Unpublished 

2 (bases 1 to 78220) 

Birren,B., Linton, L., Nusbaum, C . , Lander,E., Graham, H, 
Anderson, S., Baldwin,J., Barna,N., Beckerly,R., Beda,F., 
Boguslavkiy, L . , Boukhgalter , B . , Brown, A., Burkett,G., Castle, A., 
Choepel,Y., Colangelo, M . , Collins, S., Collymore , A . , Cooke, P., 
DeArellano, K. , Dewar,K., Domino, M., Doyle, M., Fenestor,J., 
Ferreira,P., FitzHugh,W., Forrest, C, Gage,D., Galagan,J., 
Gardyna,S., Grant, G., Hagos,B., Heaford,A., Horton,L., 
Howland, J. C . , Johnson, R., Jones, C, Kann,L., Karatas,A., Klein, J. 
Landers, T., Lehoczky,J., Levine,R., Lieu,C, Liu,G., Locke, K., 
Macdonald, P . , Marquis , N . , McEwan,P., McGur k, A. , McKernan,K., 
McPheeters , R . , Meldrim,J., Meneus,L., Morrow, J., Naylor,J., 
Norman, C.H., 0'Connor,T., 0 1 Donnell, P . , 01ivar,T.M., Peterson,K., 
Pierre, N., Pisani,C, Pollara,V., Raymond, C, Riley, R., Rothman,D 
Roy, A., Santos, R., Severy,P., Spencer, B., Stange-Thomann, N . , 
Stojanovic, N. , Subramanian, A. , Talamas,J., Tesfaye,S., Theodore, J 
Tirrell,A., Vassiliev, H . , Viel,R., Vo,A., Wu,X., Wyman,D., Ye, W.J 
Zimmer,A. and Zody,M. 
Direct Submission 

Submitted ( 0 9-FEB-2000 ) Whitehead Institute/MIT Center for Genome 

Research, 320 Charles Street, Cambridge, MA 02141, USA 

On Jul 13, 2000 this sequence version replaced gi: 6957758. 

All repeats were identified using RepeatMasker : 

Smit, A.F.A. & Green, P. (1996-1997) 

http : //ftp . genome . Washington . edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submiss ions @ genome . wi . mit . edu 

Project Information 

Center project name: L6564 
Center clone name: 758 K 16 



* NOTE: This record contains 89 individual 

* sequencing reads that have not been assembled into 



contigs. Runs of N are used to separate the reads 
and the order in which they appear is completely 
arbitrary. Low-pass sequence sampling is useful for 
identifying clones that may be gene-rich and allows 
overlap relationships among clones to be deduced. 
However, it should not be assumed that this clone 
will be sequenced to completion. In the event that 
the record is updated, the accession number will 
be preserved. . 



1 


765: 


contig 


of 7 65 bp 


in 


length 


7 66 


865: gap of 


100 bp 






866 


1 651 : 


contig 


of 78 6 bp 


in 


length 


1652 


1751 : gap 


of 


100 bp 






1752 


2513 : 


contig 


of 762 bp 


in 


length 


2514 


2613: gap 


of 


100 bp 






2614 


3394 : 


contig 


of 781 bp 


in 


length 


3395 


3494: gap 


of 


100 bp 






34 95 


4270: 


contig 


of 77 6 bp 


in 


length 


4271 


4370: gap 


of 


100 bp 






4371 


5160: 


contig 


of 7 90 bp 


in 


length 


5161 


5260: gap 


of 


100 bp 






5261 


6049: 


contig 


of 789 bp 


in 


length 


6050 


6149: gap 


of 


100 bp 






6150 


6933: 


contig 


of 7 84 bp 


in 


length 


6934 


7033: gap 


of 


100 bp 






7034 


7801: 


contig 


of 768 bp 


in 


length 


7802 


7901: gap 


of 


100 bp 






7902 


8736: 


contig 


of 835 bp 


in 


length 


8737 


8836: gap 


of 


100 bp 






8837 


9629: 


contig 


of 793 bp 


in 


length 


9630 


9729: gap 


of 


100 bp 






9730 


10507: 


contig 


of 778 bp 


in 


length 


10508 


10607: gap of 


100 bp 






10608 


11385: 


contig 


of 778 bp 


in 


length 


11386 


11485: gap of 


100 bp 






11486 


12268: 


contig 


of 783 bp 


in 


length 


12269 


12368: gap of 


100 bp 






12369 


13151: 


contig 


of 783 bp 


in 


length 


13152 


13251: gap of 


100 bp 






13252 


14017: 


contig 


of 766 bp 


in 


length 


14018 


14117: gap of 


100 bp 






14118 


14905: 


contig 


of 788 bp 


in 


length 


14906 


15005: gap of 


100 bp 






15006 


15786: 


contig 


of 781 bp 


in 


length 


15787 


15886: gap of 


100 bp 






15887 


16664 : 


contig 


of 778 bp 


in 


length 


16665 


16764: gap of 


100 bp 






16765 


17541: 


contig 


of 777 bp 


in 


length 


17542 


17641: gap of 


100 bp 






17642 


18383: 


contig 


of 742 bp 


in 


length 


18384 


18483: gap of 


100 bp 






18484 


19266: 


contig 


of 783 bp 


in 


length 


19267 


19366: gap of 


100 bp 






19367 


20128 : 


contig 


of 762 bp 


in 


length 


20129 


20228: gap of 


100 bp 






20229 


21009: 


contig 


of 781 bp 


in 


length 


21010 


21109: gap of 


100 bp 







21110 21881: contig 

21882 21981: gap of 

21982 22831: contig 

22832 22931: gap of 

22932 23699: contig 

23700 23799: gap of 

23800 24570: contig 

24571 24670: gap of 

24671 25428: contig 

25429 25528: gap of 

25529 26301: contig 

26302 26401: gap of 

26402 27167: contig 

27168 27267: gap of 

27268 28048: contig 

28049 28148: gap of 

28149 28920: contig 

28921 29020: gap "of 

29021 29796: contig 
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39.6; DB 


2; 


Length 



Pred. No. 3.8; 
0; Mismatches 



93; Indels 



0; Gaps 



0; 



Qy 82 accaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccaccacaaca 141 

Ml I I I I I III I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 68204 ACCCACCCCCCCCCCCCCCCCCCCCCCCCCCCCAAACACACCACCCCCACCCCCCCAACA 68145 



Qy 14 2 atgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagag 201 

I I I I I I I I I I I I I I I I I I I 

Db 68144 AAAAAAACAAAACCACNCCCCCCCCCCCCCCCCCCCGCCNCCCCCCCCCCCCCCCCCCCC 68085 

Qy 202 ccccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgac 261 

I I II I I I I I I I I I II III I I I I I I I I I I I I I I I 
Db 68084 CCCCCCCCCCCCCCCCCCACCNCCCCCCCCCCCACCNCCCCCCCCCCCCCCCCCCCCCCC 68025 

Qy 262 ctctcc 267 
I I I I 

Db 68024 CCCCCC 68019 



RESULT 14 
AC080179/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 



AC080179 54450 bp DNA HTG 16-JAN-2001 

Homo sapiens chromosome X clone RP11-2 66P16 map X, LOW-PASS 
SEQUENCE SAMPLING. 
AC080179 

AC080179.2 GI:12232512 
HTG; HTGS_PHASE0 . 
human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 54450) 

Birren,B., Linton, L .,' Nusbaum, C . and Lander, E. 
Homo sapiens chromosome X, clone RP11-266P16 
Unpublished 

2 (bases 1 to 54450) 

Birren,B., Linton, L., Nusbaum, C, Lander, E . , Abraham, H., Allen,N. 
Anderson, S., Barna,N., Bastien,V., Beda,F., Boguslavkiy, L. , 
Boukhgalter, B . , Brown, A., Burkett,G., Campopiano, A. , Castle, A., 
Choepel,Y., Colangelo, M . , Collins, S., Collymore, A . , Cooke, P., 
DeArellano, K. , Dewar,K., Diaz, J. S., Dodge, S., Ferreira,P., 
FitzHugh,W., Gage,D., Galagan,J., Gardyna,S., Ginde,S., Goyette,M 
Graham, L., Grand-Pierre, N . , Hagos,B., Heaford,A., Horton,L., 
Iliev,I., Johnson, R., Jones, C, Kann,L., Karatas,A., LaRocque,K., 
Lamazares, R. , Landers, T., Lehoczky,J., Levine,R., Lieu,C, Liu,G. 
Macdonald, P. , Marquis, N., McCarthy, M., McEwan,P., McKernan,K., 
McPheeters, R. , Meldrim,J., Meneus,L., Mihova,T., Mlenga,V., 
Morrow, J., Murphy, T., Naylor,J., Norman, C.H., O'Connor, T., 
0 1 Donnell , P . , 0*Neil,D., 01ivar,T.M., Oliver, J., Peterson, K., 
Pierre, N., Pisani,C, Pollara,V., Raymond, C, Rieback,M., Riley, R 
Rogov,P., Rothman,D., Roy, A., Santos, R., Schauer,S., Severy,P., 
Sougnez, C, Spencer, B., Stange-Thomann, N . , Sto j anovic, N . , * 
Strauss, N., Subramanian, A. , Talamas,J., Tesfaye,S., Theodore, J., 
Tirrell,A., Travers,M., Trigilio,J., Vassiliev, H . , Viel,R., Vo,A. 
Wilson, B., Wu,X., Wyman,D., Ye,W.J., Young, G., Zainoun,J., 
Zimmer,A. and Zody,M. 
Direct Submission 

Submitted (2 8-SEP-2000 ) Whitehead Institute/MIT Center for Genome 

Research, 320 Charles Street, Cambridge, MA 02141, USA 

On Jan 16, 2001 this sequence version replaced gi:10334899. 

All repeats were identified using RepeatMasker : 

Smit, A.F.A. & Green, P. (1996-1997) 

http : / /ftp . genome , Washington . edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR - 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submiss ions @ genome . wi .mit . edu 
Project Information 

Center project name: L11199 

Center clone name: 266 P 16 



* NOTE: This record contains 67 individual 

* sequencing reads that have not been assembled into 



contigs. Runs of N are used to separate the reads 
and the order in which they appear is completely 
arbitrary. Low-pass sequence sampling is useful for 
identifying clones that may be gene-rich and allows 
overlap relationships among clones to be deduced. 
However, it should not be assumed that this clone 
will be sequenced to completion. In the event that 
the record is updated, the accession number will 
be preserved. 
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39910 40009: gap of 

40010 40695: contig 

40696 40795: gap of 
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FEATURES 

source 



43147 43246: gap of 
43247 43972: cont 

43973 44072: gap of 
44073 44791: cont 

44792 44891: gap of 



44892 



45608: cont 



45609 45708: gap of 
45709 46431: cont 
46432 46531: gap of 
46532 47231: cont 
47232 47331: gap of 
47332 48036: cont 
48037 48136: gap of 
48137 48852: cont 
48853 48952: gap of 
48953 49694: cont 
49695 49794: gap of 
49795 50475: cont 
50476 50575: gap of 
50576 51276: cont 
51277 51376: gap of 
51377 52115: cont 
52116 52215: gap of 
52216 52898: cont 
52899 52998: gap of 
52999 53731: cont 
53732 53831:- gap of 
53832 54450: cont 

Locat ion/Qualif i 

1. .54450 

/organism= £,, Homo sapiens' 
/db_xref="taxon: 9606" 
/ chromosome="X" 
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Query Match 9.5%; Score 39.2; DB 2; 

Best Local Similarity 44.1%; Pred. No. 4.9; 
Matches 86; Conservative 0; Mismatches 109; 



Length 54450; 
Indels 0; 



Gaps 



0; 



Qy 73 cctagcgataccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaacc 132 

III I I I I I I I I I I I II I I I I M I I I I III I I I I I II 
Db 4 7218 CCCANACACACAAAAAAAACCAACCCCACCCAACCAACCACCACACAAAACCACCACCCC 47159 

Qy 133 accacaacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggc 192 

I I I I I I I I I I I I I I I I I I I 

Db 47158 CCCCCNNCNNCCCCCCNCCCACNNCCCCCCCCCCCCCCCCNNNNCCNCCCCCCCCCCCCN 47099 

Qy 193 atggtagagccccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgc 252 

I I I I I II I I I I I I I I I I I I I I I I 

Db 47098 CCCNNCNCCCCCCCCCCCCCCCCCNCCCCACCCNCCCCCCCNCCCCCNNCCNNNNNNNCC 47039 



Qy 253 gccgccgacctctcc 267 

I I I I I I I I 
Db 47038 NCCCCCCCNCCCCCC 47024 



RESULT 15 
AE005747/C 



LOCUS AE005747 11070 bp DNA BCT 28-MAR-2001 

DEFINITION Caulobacter crescentus section 73 of 359 of the complete genome. 
ACCESSION AE005747 AE005673 
VERSION AE005747.1 GI:13421943 

KEYWORDS 

SOURCE Caulobacter crescentus. 

ORGANISM Caulobacter crescentus 

Bacteria; Proteobacteria ; alpha subdivision; Caulobacter group; 
Caulobacter . 

REFERENCE 1 (bases 1 to 11070) 

AUTHORS Nierman, W.C. , Feldblyum, T . V. , Laub, M . T . , Paulsen, I . T . , Nelson, K.E., 
Eisen,J., Heidelberg, J. F. , Alley, M . R . K . , Ohta,N., Maddock, J. R. , 
Potocka,I., Nelson, W.C, Newton, A., Stephens, C, Phadke,N.D., 
Ely, B . , DeBoy, R. T . , Dodson,R.J., Durkin,A.S., Gwinn,M.L., 
Haft,D.H., Kolonay, J. F. , Smit,J., Craven, M., Khouri,H., Shetty,J., 
Berry, K., Utterback, T . , Tran,K., Wolf, A., Vamathevan, J . , 
Ermolaeva,M. , White, 0., Salzberg, S . L . , Venter, J. CI, Shapiro, L. and 
Fraser,C.M. 

TITLE Complete genome sequence of Caulobacter crescentus 

JOURNAL Proc. Natl. Acad. Sci. U.S.A. 98 (7), 4136-4141 (2001) 

MEDLINE 21173698 
REFERENCE 2 (bases 1 to 11070) 

AUTHORS Nierman, W.C. , Feldblyum, T . V. , Paulsen, I . T . , Nelson, K.E., Eisen,J., 
Heidelberg, J. F. , Alley, M . R. K. , Ohta,N., Maddock, J . R. , Potocka,I., 
Nelson, W.C, Newton, A., Stephens, C, Phadke,N.D., Ely, B . , 
Laub, M . T . , DeBoy, R.T., Dodson,R.J., Durkin,A.S., Gwinn,M.L., 
Haft,D.H., Kolonay, J. F. , Smit,J., Craven, M., Khouri,H., Shetty,J., 
Berry, K., Utterback, T . , Tran,K., Wolf, A., Vamathevan, J . , 
Ermolaeva,M. , White, C, Salzberg, S . L . , Shapiro, L., Venter, J. C. and 
Fraser,C.M. 

TITLE Direct Submission 

JOURNAL Submitted (31- JAN-2001 ) The Institute for Genomic Research, 9712 
Medical Center Dr, Rockville, MD 20850, USA 
FEATURES Location/Qualifiers 
source 1. .11070 

/organism= "Caulobacter crescentus" 
/db_xref="taxon: 69394" 
gene complement ( 110 . .811) 

/gene="CC0717" 
CDS complement (110 . .811) 

/gene="CC0717" 

/note="similar to GB:AE000513; identified by sequence 
similarity; putative" 
/codon_start=l- 
/trans l_table=ll 

/product="nodul in-related protein" 
/protein_id="AAK22702 .1" 
/db_xref="GI : 13421944". 

/trans la tion="MTRLKVHVERHAVSRIGWLRAAVLGANDGIVSTASLVVGVAAAE 
ATRGPILLAAGAGLVAGAMSMAAGEYVSVASQADSEAADLARERAELATQPEEELEEM 
TAIYVARGLTPDLARQVAEQLNAGDALAAHARDELGISEHVTARPVQAALTSAATFAV 
GAAMPLVVSLLAPLPVIIPTISVATLVFLAVLGWLGARTGGASPWKPMLRVTFWGALA 
LL VT AV IGKL FG AW " 

gene complement ( 959 . .1435) 

/gene="CC0718" 

CDS complement (959. .1435) 

/gene="CC0718" 



/note="identif ied by Glimmer2; putative" 

/codon_start=l 

/transl_table=ll 

/product="hypothetical protein" 

/protein_id="AAK22703. 1" 

/db_xref="GI: 13421945" 

/translat ion="MGSAPRAAQGAAVVAGDERHSRAVAVIPPRDLHEHHREQLDALV 

AACVLIAQVDGAVTPDERGRMVERLKLHAGLEGADLEAALRAFEALDARFDARPEETW 

VEAELM I RRLKGS AEAEGVALAAVAVS I DGGLEAEERAAVLDI CAWLGVS PVRVL P " 

complement ( 1357 . .1626) 

/gene="CC0719" 

complement ( 1357 . .1626) 

/gene="CC0719" 

/note="identif ied by Glimmer2; putative" 

/codon_start=l 

/trans l_table=ll 

/product = "hypothetical protein" 

/protein_id="AAK22704 .1" 

/db_xref="GI: 13421946" 

/translat ion="MLGPGGLPPLAIPLARRRARPSTPEERPLTLLNARQLRAEIARL 

TTALHEEARKPTPDRGLMRLWDLRRERLKEQLWSQETNATAARWR" 

complement (1771 . .2829) 

/gene="CC0720" 

complement (1771. .2829) 

/gene="CC0720" 

/note="identif ied by match to protein family HMM" 

/codon__start=l 

/trans l_table=ll 

/product="asparaginyl-tRNA synthetase, putative" 
/protein_id="AAK22705. 1" 
/db_xref="GI:1342194 7" 

/translat ion="MSLPSSSPGAPWWRAGVHADRRPFLLARNRITKAFRAWFEVQGF 

EEVEAAALQVSPGNEAHLHAFATQALTIAGEASPLYLHTSPEFACKKLLAAGENKIFS 

LGKVWRNRERGPLHHPEFTMLEWYRAT^APYQTLMDDCAVLLALAAETAGTQRFAFRGM 

TCDPFAAPERLSVAEAFQRHAGVDLLASVAADGSTDRDALTAQARAAGIRVAEDDGWA 

DIFSRIIVEKVEPRLGEGRATILCEYPISEAALARPKDADPRVAERFELYACGVELAN 

AFGELTDPAEQRRRFLIEMDEKARI YGERYPIDEDFLAALAHMPPASGSALGFDRLVL 

LATGARRI E DVVWT PVAG " 

2950. .3516 

/gene="CC0721" 

2950. .3516 

/gene="CC0721" 

/note="identif ied by match to TIGR protein family HMM 

TIGR00038" 

/codon_start=l 

/transl_table=ll 

/product="translation elongation factor P" 
/protein_id="AAK22706. 1" 
/db_xref="GI: 13421948" 

/translation="MKVAASSLRKGSVVDMDGKLYVVLSAENIHPGKGTPVTQLNMRR 

ISDGVKVSERYRTTEQVERAFVDDRNHTFLYSDGDGYHFMNPESYDQLVATEDVIGDA 

APYLQEGMTVILSTHNDVPIAIDLPRTVVLEIVDTEPSVKGQTASSSYKPAVLSNGVK 

TTVPPYITAGTKVVILTEDGSYVERAKD" 

complement (3716. .6190) 

/gene="CC0722" 

complement (3716. .6190) 

/gene="CC0722" 



/note="identif ied by match to protein family HMM" 

/codon_start=l 

/transl_table=ll 

/product="TonB-dependent receptor" 
/protein_id="AAK22707 .1" 
/db_xref ="GI : 13421949" 

/trans lation="MRTKTKMTGHAALLCGAALCALLATAAQAAETPATAT PAASPDT 

DPNSVEQVIITSTRETRSAVAMEAQQIQRVLPGASPLKAIQFLPGVIYRTADPWGNNE 

QNLSLFVHGFSTQQLGYTMDGVPLGDQQYGNYNGLSVSRAVTSENVSRVTLSSGAGSL 

GVPSTSNLGGAIETFSRDPGAERSLTVNQTFGSYEASRTFLRAESGEMAGGFKGYLSY 

LRQDAKAWDFDGRQRGHQVNLKLVREGEKGDLTFYANWQMKVEPNEDATAMGNQQTAA 

AQNFKPYSRAFIYPDLAGCTANLTGGPGTPPPAQGNNFSNCFSAAQREDILTYISYAW 

KPTETLTWTNQAYYHYNFGRGIVSGPVNTAGLPGLFATYYPNLVVGTTTSPTTLTNIV 

NLFGGSGYAVRTTEYRINRPGLISTVEWTLGDHQIKGGLWFEHNEPAQQRAWYPMTKA 

NNNLSPYDVPRGNKVLTQYANAFFINNLQLHLQDQWRVTPKLLVQAGFKSSLQDATGK 

FPVNQKNLPTVAVPVQFPTGSIESNKWFLPQAGLLWDATENDQVFVNVQQNMRQFIPY 

GAGSGFYGFSAWSLGTQAAFELFKKTVKPETSWTYEAGLRSRREVAWGPITGIEGQIN 

AYNVKFSNRLLNIAPFNFINPAPAILANVGGVTTKGVDLAGTIKFGPRFQI YNAVSFN 

ESKYDSDYQSGTTLVNGVSTPVTVATGGKFVPLTPRWMNKTIARVNLGKFDAQISGDY 

IGKRPVTYLNDLSVKAMMLVDLQAGYSFDIADGGMLKGLRLTANVTNLFDKKGVSTAG 

VTSNSGGYVAFPQPPRMGFVTVSAEF" 

complement (6325. .8100) 

/gene="CC0723" 

complement (6325. .8100) 

/gene="CC0723" 

/note="identif ied by match to protein family HMM" 

/codon_start=l 

/trans l_table=ll 

/product="sensor histidine kinase / response regulator" 
/protein_id="AAK22708 .1" 
/db_xref="GI : 13421950" 

/trans lation="MRPFATCKGAARTRSLVSFPRTSAALPRFAEWLDQDVTSVTARS 

LPTLGVRLAICT^ATALI YALAVDLQGGLLWGLAVASAEAAVFI CS S PQRADRKI SHAQ 

RLSYVAAVAWMNIVWWSLAIMLWRQDHPALRMAALCVVCAFLVHAQAFTARSKTLLLI 

VGGGSATLLLVLCGVLNDFPPSERLALCAAAIILIIYTAKAAQTNGQQGRALELAKSQ 

AEAASQAKSEFLALMSHELRTPLNGMLGLSQALKLEPLEPGEREQVELLEESGRTLLA 

LLNDVLDMAKIEAGKLDIAPTQEDLSRLAERVVRINQAQAREKGTEITLEIDPATPRA 

LLFDPLRVRQCLGNLVSNAVKFTPAGQIRVRVTCEAGDQPDRMLAKIIVSDTGVGMGP 

AVLARIFTPFEQADPTIARRTGGTGLGLNITRRLAQMMGGSVSVRSTEGKGSTFTLTF 

GCRLPSAGAAETGGPLSGEHPLRLLVVDDYAVNRKVIAMMLAPLGCEIVEADNGQRAL 

DLLAEREVDAVLLDFNMPVMGGLETTRRLRADPRWRKLPIVCLTAGVMDDERAAAATA 

GMDGFIEKPIEMSTLVSTLARVARR" 

complement (8170. .8379) 

/gene="CC0724" 

complement (8170. .8379) 

/gene="CC0724" 

/note="identif ied by Glimmer2; putative" 
/codon_start=l 
/trans l_table=ll 
/product="hypothetical protein" 
/protein_id="AAK22709. 1" 
/db_xref="GI : 13421951" 

/trans la tion="MIEELDTVALLIDKPALGLEKGLIGAVVYVHGAHEAFEVEFVDD 

DGHTFCLETFKPEELLKIHRRAKAA" 

complement (837 6. .8720) 

/gene="CC0725" • 

complement (837 6. .8720) 



/gene="CC0725" 

/note="identif ied by Glimmer2; putative" 

/codon_start=l 

/trans l_table=ll 

/product="conserved hypothetical protein" 
/protein_id="AAK22710 . 1" 
/db_xref="GI: 13421952" 

/trans lation="MKLPGGESVQIDPRKLVDYALSPTHPVGKHKARLFASRLGLTQA 

DAPALEAALRIAAASQDATLIKSDAWGQRWQIDFMFQFGALSAMITSGWIVPADGSAV 

RLTSVFIARDPG" 
gene complement ( 8725 . .9666) 

/gene="CC0726" 
CDS complement (8725. .9666) 

/gene="CC0726" 

/note="identif ied by match to PFAM protein family HMM 

PF00766" 

/codon_start=l 

/trans l_table=ll 

/product="electron transfer f lavoprotein, alpha subunit" 
/protein__id="AAK22711 . 1 " 
/db_xref="GI: 13421953" 

/trans lation="MAVLVVADNDNALLRDATHKTVTAALKISGDVDVLVLGKGAKAV 
ADQAAKIAGVRKVLLAESDALGHGVAEAQADAVLALAGNYDAILVPATSGGKNFAPRV 
7^AKLDVAPISEIVDVVAADTFTRPIYAGNALETVQSSDSKKVITVRPTAFAAAAEGGS 
ASVESVSGADAAKTRFVSEEMVKSDRPELAAAKIWSGGRAMGSAEEFQRVIEPLADK 
LGAAVGASR7^AVDAGYAPNDYQVGQTGKVVAPQLYVAIGISGAIQHLAGMKDSKVIVA 
INKDADAPIFQVADFGLVADYKSAVPELMDALAAAGK" 
gene complement ( 9666 . .10412) 

/gene="CC0727" 

Query Match 9.5%; Score *39; DB 1; Length 11070; 

Best Local Similarity 51.4%; Pred. No. 5.8; 

Matches 90; Conservative 0; Mismatches 85; Indels 0; Gaps 0; 

Qy 105 ccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaacccaggcccg 164 

Mill I I I I I I I I I I I I I I I I I III II 

Db 28 60 CGAAATCTCCAGCCTCTTCGGTTTTGCCGCCTTGTCCCTGCCCTCCTCCTCCCCCGGCGC 2801 

Qy 165 tctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgctcgcaatcc 224 

I I I I I I I I I I I I I II III I I I I I I I I I I J I 

Db 2800 CCCATGGTGGCGCGCCGGCGTCCACGCTGACCGCCGGCCGTTCCTCCTGGCCCGCAACCG 27 41 

Qy 225 cat caeca tgacccctcacgcctggcgcgccgccgacct'ctccaagaaagtcgtg 27 9 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I 
Db . 274 0 GATCACCAAGGCGTTCCGCGCCTGGTTCGAGGTTCAGGGCTTCGAGGAAGTCGAG 2 68 6 



Search completed: February 1, 2002, 11:10:46 
Job time: 10172 sec 

GenCore version 4.5 
Copyright (c) 1993 - 2000 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on: 



February 7, 2002, 11:00:07 ; Search time 428.31 Seconds 

(without alignments) 
822.677 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



Searched: 



US-09-394-745-6603 
411 

1 agcaaaagcatagagatcca . 



, aggagaagaggaagggaccg 411 



IDENTITY_NUC 

Gapop 10.0 , Gapext 1 . 0 

930621 seqs, 428662619 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



1861242 



Database 



N_Geneseq_1101: * 

1 : /SIDS2/gcgdata/geneseq/geneseqn/NA1980 . DAT : * 

2 : /SIDS2/gcgdata/geneseq/geneseqn/NA1981 . DAT: * 

3 : /SIDS2/gcgdata/geneseq/geneseqn/NA1982 . DAT : * 

4 : /SIDS2/gcgdata/geneseq/geneseqn/NA1983 . DAT: * 

5 : /SIDS2/gcgdata/geneseq/geneseqn/NA1984 . DAT : * 

6 : /SIDS2/gcgdata/geneseq/geneseqn/NA1985 . DAT : * 

7 : /SIDS2/gcgdata/geneseq/geneseqn/NA1986. DAT: * 

8 : /SIDS2/gcgdata/geneseq/geneseqn/NA1987 . DAT: * 
9 : /SIDS2/gcgdata/geneseq/geneseqn/NA1988 . DAT : * 
10 : /SIDS2/gcgdata/geneseq/geneseqn/NA1989 . DAT : * 
11 : /SIDS2/gcgdata/geneseq/geneseqn/NA1990 . DAT : * 
12 : /SIDS2/gcgdata/geneseq/geneseqn/NA1991 . DAT : * 
13: /SIDS2/gcgdata/geneseq/geneseqn/NA1992 . DAT 
14 : /SIDS2/gcgdata/geneseq/geneseqn/NA1993 . DAT 
15 : /SIDS2/gcgdata/geneseq/geneseqn/NA1994 . DAT 
16: /SIDS2/gcgdata/geneseq/geneseqn/NA1995 . DAT 
17 : /SIDS2 /gcgdata/geneseq/geneseqn/NA1996 . DAT 
18 : /SIDS2/gcgdata/geneseq/geneseqn/NA1997 . DAT 
19: /SIDS2/gcgdata/geneseq/geneseqn/NA1998 . DAT 
20: /SIDS2 /gcgdata/geneseq/geneseqn/NAl 999 . DAT 
21 : /SIDS2/gcgdata/geneseq/geneseqn/NA2000 . DAT : * 
22 : /SIDS2/gcgdata/geneseq/geneseqn/NA2001 . DAT: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



Result Query 

No. Score Match Length DB ID 



Description 



1 44.2 10.8 729 21 AAF09358 

2 39 9.5 4674 21 AAA14666 
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ALIGNMENTS 



RESULT 1 
AAF09358/C 

ID AAF09358 standard; ' cDNA; 729 BP. 

"xx 

AC AAF09358; 
XX 

DT 13-MAR-2001 (first entry) 
XX 

DE Fusarium venenatum EST SEQ ID NO: 1881. 



XX 

KW Multiple gene expression; filamentous fungal cell; EST; 

KW expressed sequence tag; Fusarium venenatum; Aspergillus niger; 

KW Aspergillus oryzae;. Trichoderma reesei; identification; recombination; 

KW culture condition; environmental stress; spore morphogenesis; 

KW metabolic pathway engineering; catabolic pathway engineering; ss. 

XX 

OS Fusarium venenatum. 
XX 

PN WO200056762-A2. 
XX 

PD 28-SEP-2000. 
XX 

PF 22-MAR-2000; 2000WO-US077 81 . 
XX 

PR 22-MAR-1999; 99US-0273623 . 
XX 

PA (NOVO ) NOVO NORDISK BIOTECH INC. 

PA (NOVO ) NOVO NORDISK AS. 

XX 

PI Berka RM, Rey MW, Shuster JR, Kauppinen S, Clausen IG, Olsen PB; 
XX 

DR WPI; 2000-594572/56. 
XX 

PT Monitoring differential expression of genes in filamentous fungal cells 

PT uses fluorescence-labeled nucleic acids isolated from the cells and a 

PT substrate of expressed sequence tags - 
XX 

PS Claim 86; Page 1085-1086; 3161pp; English. 
XX 

CC The present invention describes a method for monitoring differential 

CC expression of genes in a first filamentous fungal (FF) cell relative to 

CC expression of the same genes in one or more second filamentous fungal 

CC cells. The method uses fluorescence-labeled nucleic acids isolated from 

CC the FF cells and a substrate of expressed sequence tags (EST) . The ESTs 

CC are used in the methods for monitoring differential expression of genes 

CC in a first filamentous fungal (FF) cell relative to expression of the 

CC same genes in one or more second filamentous fungal cells. Monitoring 

CC the global expression of genes from FF cells allows the production 

CC potential of the microorganisms to be improved. New genes may be 

CC discovered, possible functions of unknown open reading frames can be 

CC identified and gene copy number variation and stability can be- 

CC monitored. The expression of genes can be used to study how FF cells 

CC adapt to changes in culture conditions, environmental stress, spore 

CC morphogenesis, recombination, metabolic or catabolic pathway 

CC engineering. Using ESTs provides several advantages over genomic or 

CC random cDNA clones including elimination of redundancy as one spot on an 

CC array equals one gene or open reading frame, and organisation of the 

CC microarrays based on function of the gene . products to facilitate 

CC analysis of the results. AAF07478 to AAF11247 represents ESTs from 

CC Fusarium venenatum; AAF11248 to AAF11853 represents ESTs from Aspergillus 

CC niger; AAF11854 to AAF14878 represents ESTs from Aspergillus oryzae; and 

CC AAF14879 to AAF15337 represents ESTs from Trichoderma reesei, which are 

CC all specifically claimed in the present invention. 

XX 

SQ Sequence 729 BP; 202 A; 118 C; 202 G; 206 T; 1 other; 



Query Match 10.8%; Score 44.2; DB 21; 

Best Local Similarity 53.1%; Pred. No. 0.0011; 
Matches 94; Conservative 0; Mismatches 83;. 



Length 729; 
Indels 0; Gaps 



Qy 184 gcccgcggcatggtagagccccaccccttcgctcgcaatcccatcaccatgacccctcac 243 

I I I I I I III I I I I I I I I I I I II I I I I II II II 

Db 4 59 GCCGGCCGAGCTATGGAATCTCACCCCTTCGAGCGTCTTCCCCGAACTCAGAAGCCCGCT 4 00 

Qy 244 gcctggcgcgccgccgacctctccaagaaagtcgtgaagacaagcactgtcttcttcccc 303 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 399 TCTCCTGATTACGCCAAGATGTTCAAGAGAGTTGGCAGCCAAGCCCTCTTCTTCTTCCCT 340 

Qy 304 ttctatgcaggtatccttggatggccagtcgcagccgcctggtggttcaacggaaac 360 

II III I I I I I I II I I I I I I I I I I I I III I I I I 

Db 339 GGCTTCGCTGTCATCCTTGGCTGGCCTTTGGCTGCCAGTATGCCTTTGACGGTAGAC 283 



AAA14666; 

08-AUG-2000 (first 
Nucleotide sequence 



RESULT 2 
AAA14666 ' 

ID AAA14666 standard; 
XX 
AC 
XX 
DT 
XX 
DE 
XX 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
XX 
OS 
OS 
XX 
FH 
FT 
FT 
FT 
XX 
PN 
XX 
PD 
XX 
PF 
XX 
PR 
PR 
PR 
XX 
PA 
XX 



DNA; 4674 BP. 
entry) 

of modified FK-520 PKS gene cluster module 



FK-520; polyketide synthase; PKS; gene cluster; immunosuppressant; 
immunophilin; FK-506 binding protein; polyketide compound; uveitis; 
transplant rejection; graf t-versus-host disease; alopecia universalis 
autoimmune chronic active hepatitis; inflammatory bowel disease; 
multiple sclerosis; primary biliary cirrhosis; scleroderma; 
neurite outgrowth; nerve regrowth; Parkinson's disease; 
Alzheimer's disease; stroke; traumatic spinal cord; brain injury; 
peripheral neuropathy; ss. 

Synthetic. 

Streptomyces hygroscopicus . 



Key 
CDS 



WO200020601-A2 , 



13-APR-2000. 

01- OCT-1999; 

02- OCT-1998 
ll-MAR-1999 
17-JUN-1999 



Location /Qualifiers 
3. .4673 
/*tag= a 

/note= "no termination codon given" 



99WO-US22886. 



98US- 
99US- 
99US- 



0102748. 
0123810. 
0139650. 



(KOSA-) KOSAN BIOSCIENCES INC. 



PI Reeves C, Chu D, Khosla C, Santi D, Wu K; 
XX 

DR WPI; 2000-317716/27. 

DR P-PSDB; AAY84730. 
XX 

PT New isolated polyketide synthase nucleic acid and polyketide compounds, 

PT useful for treating e.g. transplant rejection, uveitis, multiple 

PT sclerosis, Alzheimer's disease, Parkinson's disease, stroke, or 

PT peripheral neuropathy 

XX 

PS Example 2; Page 93-96; 126pp; English. 
XX 

CC .The present sequence represents module 8 of the FK-520 polyketide 

CC synthase (PKS) gene cluster, containing the acyltransf erase (AT) 

CC domain of module 12 of rapamycin. FK-506 is a potent immunosuppressant, 

CC and acts through intial formation of an intermediate complex with 

CC protein immunophilins known as FK-506 binding proteins. The nucleic 

CC . acids are used for producing polyketide compounds. The polyketide . 

CC compounds can be used as immunosuppressants to prevent or treat 

CC transplant rejection, graf t-versus-host disease or uveitis. They can 

CC also be used for treating e.g. alopecia universalis, autoimmune 

CC chronic active hepatitis, inflammatory bowel disease, multiple 

CC sclerosis, primary biliary cirrhosis, or scleroderma. They 

CC also have neurotrophic activity and can be used to promote neurite 

CC outgrowth in NGF-treated PC12 cells and in sensory neuronal cultures, 

CC and in intact animals, they promote regrowth of damaged facial and 

CC sciatic nerves, and repair lesioned serotonin and dopamine neurons in 

CC the brain. They can also be used for treating e.g. Parkinson's disease, 

CC Alzheimer's disease, stroke, traumatic spinal cord and brain injury, or 

CC peripheral neuropathies. They can also be used in agricultural and 

CC veterinary applications. 

XX 

SQ Sequence 4674 BP; 704 A; 1873 C; 1464 G; 633 T; 0 other; 



Query Match 9.5%; Score 39; DB 21; Length 4674; 

Best Local Similarity 47.7%; Pred. No. 0.086; 

Matches 114; Conservative 0; Mismatches 125; Indels 0; Gaps 

Qy 105 ccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaacccaggcccg 164 

I I I I I I I I I I I I I I I I I I I I II I I I III II 

Db 4250 ccaaatcacccaagccctcacccacataccacaacccctcaccggcatcttccacaccgc 4309 

Qy 165 tctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgctcgcaatcc 22 4 

II I I I I I I I II I I I I I I I I I I I II I 

Db 4310 cgccaccctcgacgacgccaccctcaccaacctcaccccccaacacctcaccaccaccct 4369 

Qy 225 catcaccatgacccctcacgcctggcgcgccgccgacctctccaagaaagtcgtgaagac 284 

I II I II I I I I I I I I I I I I I I I I I I I I II 

Db 4 370 ccaacccaaagccgacgccgcctggcacctccaccaccacacccaaaaccaacccctcac 4 429 

Qy 285 aagcactgtcttcttccccttctatgcaggtatccttggatggccagtcgcagccgcct 34 3 

I I I I I I I I I I ! Ill I I I I I I I II I I I I I I II 
Db 4 4 30 ccacttcgtcctctactccagcgccgccgccaccctcggcagccccggccaagccaact 4 4 88 



RESULT 3 



AAA14665 

ID AAA14665 standard; DNA; 4725 BP. 
XX 

AC AAA14665; 
XX 

DT 08-AUG-2000 (first entry) 
XX 

DE Nucleotide- sequence of FK-520 PKS gene cluster module 8. 
XX 

KW FK-520; polyketide synthase; PKS; gene cluster; immunosuppressant; 

KW immunophilin; FK-506 binding protein; polyketide compound; uveitis; 

KW transplant rejection; graf t-versus-host disease; alopecia universalis; 

KW autoimmune chronic active hepatitis; inflammatory bowel disease; 

KW multiple sclerosis; primary biliary cirrhosis; scleroderma; 

KW neurite outgrowth; nerve regrowth; Parkinson's disease; 

KW Alzheimer's disease; stroke; traumatic spinal cord; brain injury; 

KW peripheral neuropathy; ss. 

XX 

OS Streptomyces hygroscopicus . ' 
XX 

FH Key Location/Qualifiers 

FT CDS 3.. 4724 

FT /*tag= a 

FT /note= "no termination codon given" 
XX 

PN WO200020601-A2. 
XX 

PD 13-APR-2000. 
XX 

PF 01-OCT-1999; 99WO-US22886 . 
XX 

PR 02-OCT-1998; 98US-0102748 . 

PR ll-MAR-1999; 99US-0123810 . 

PR 17-JUN-1999; 99US-0139650 . 
XX 

PA (KOSA-) KOSAN BIOSCIENCES INC. 
XX 

PI Reeves C, Chu D, Khosla C, Santi D, Wu K; 
XX 

DR WPI; 2000-317716/27. 

DR P-PSDB; AAY84729. 
XX 

PT New isolated polyketide synthase nucleic acid and polyketide compounds, 

PT useful for treating • e . g . transplant rejection, uveitis, multiple 

PT sclerosis, Alzheimer ! s disease, Parkinson 1 s disease, stroke, or 

PT peripheral neuropathy 

XX 

PS Example 2; Page 90-93; 126pp; English. 
XX 

CC The present sequence encodes module 8 of the FK-520 polyketide 

CC synthase (PKS) gene cluster of strain MA6548. FK-506 is a potent 

CC immunosuppressants, and acts through intial formation of an 

CC intermediate complex with protein immunophilins known as FK-506 

CC binding proteins. The nucleic acids are used for producing polyketide 

CC compounds. The polyketide compounds can be used as immunosuppressants to 

CC prevent or treat transplant rejection, graf t-versus-host disease or 

CC uveitis. They can also be used for treating e.g. alopecia universalis, 



CC autoimmune chronic active hepatitis, inflammatory bowel disease, 

CC multiple sclerosis, primary biliary cirrhosis, or scleroderma. They 

CC also have neurotrophic activity and can be used to promote neurite 

CC outgrowth in NGF-treated PC12 cells and in sensory neuronal cultures, 

CC and in intact animals, they promote regrowth of damaged facial and 

CC sciatic nerves, and repair lesioned serotonin and dopamine neurons in 

CC the brain. They can also be used for treating e.g. Parkinson's disease, 

CC Alzheimer's disease, stroke, traumatic spinal cord and brain injury, or 

CC peripheral neuropathies. They can also be used in agricultural and 

CC veterinary applications. 

XX 

SQ Sequence 4725 BP; 728 A; 2034 C; 1394 G; 569 T; 0 other; 



Matches 


Qy 


105 


Db 


4301 


Qy 


165 


Db 


4361 


Qy 


225 


Db 


4421 


Qy 


285 


Db 


4,4 81 



Query Match 9.5%; Score 39; DB 21; Length 4725; 

Best Local Similarity 47.7%; Pred. No. 0.086; 



I I I II I I I I I I I I I I I I I I I II I I I III II 



II I I I J I I I I I I I I I I I I I I.I I III 



III II i II I I I I I I I I I I I I I I I I I II 



I I I I I I I I I III I I I I I I I I I I I I I I I 



RESULT 4 
AAA14668 

ID AAA14668 standard; DNA; 4737 BP. 
XX 

AC AAA14668; 
XX 

DT 08-AUG-2000 {first entry) 
XX 

DE Nucleotide sequence of modified FK-520 PKS gene cluster module 8. 
XX 

KW FK-520; polyketide synthase; PKS; gene cluster; immunosuppressant; 

KW immunophilin; FK-506 binding protein; polyketide compound; uveitis; 

KW transplant rejection; graf t-versus-host disease; alopecia universalis; 

KW autoimmune chronic active hepatitis; inflammatory bowel disease; 

KW multiple sclerosis; primary biliary cirrhosis; scleroderma; 

KW neurite outgrowth; nerve regrowth; Parkinson's disease; 

KW Alzheimer f s disease; stroke; traumatic spinal cord; brain injury; 

KW peripheral neuropathy; ss. 

XX 

OS Synthetic. 

OS Streptomyces hygroscopicus . 
XX 



FH Key Location/Qualifiers 

FT CDS 3.. 4736 

FT /*tag= a 

FT /note= "no termination codon given" 
XX 

PN WO200020601-A2. 
XX 

PD 13-APR-2000. 
XX 

PF 01-OCT-1999; 99WO-US2288 6 . 
XX 

PR 02-OCT-1998; 98US-0102748 . 

PR ll-MAR-1999; 99US-0123810 . 

PR 17-JUN-1999; 99US-0139650 . 
XX 

PA * (KOSA-) KOSAN BIOSCIENCES INC. 
XX 

PI Reeves C, Chu D, Khosla C, Santi D, Wu K; 
XX 

DR WPI; 2000-317716/27. 

DR P-PSDB; AAY84732.. 
XX 

PT New isolated polyketide synthase nucleic acid and polyketide compounds, 

PT useful for treating e.g. transplant rejection, uveitis, multiple 

PT sclerosis, Alzheimer's disease, Parkinson's disease, stroke,, or 

PT peripheral neuropathy 

XX 

PS Example 2; Page 99-102; 126pp; English. 
XX 

CC The present sequence represents module 8 of the FK-520 polyketide 

CC synthase (PKS) gene cluster, ■ containing the acyltransf erase (AT) 

CC domain of module 12 of rapamycin. FK-506 is a potent immunosuppressant, 

CC and acts through intial formation of an intermediate complex with 

CC protein immunophilins known as FK-506 binding proteins. The nucleic 

CC acids are used for producing polyketide compounds. The polyketide 

CC compounds can be used as immunosuppressants to prevent or treat 

CC transplant rejection, graf t-versus-host disease or uveitis. They can 

CC also be used for treating e.g. alopecia universalis, autoimmune 

CC chronic active hepatitis, inflammatory bowel disease, multiple 

CC sclerosis, primary biliary cirrhosis, or scleroderma. They 

CC also have neurotrophic activity and can be used to promote neurite 

CC outgrowth in NGF-treated PC12 ceils and in sensory neuronal cultures, 

CC and in intact animals, they promote regrowth of damaged facial and 

CC sciatic nerves, and repair lesioned serotonin and dopamine neurons in 

CC the brain. They can also be used for treating e.g. Parkinson's disease, 

CC Alzheimer ! s disease, stroke, traumatic spinal cord and brain injury, or 

CC peripheral neuropathies. They can also be used in agricultural and 

CC veterinary applications. 

XX 

SQ Sequence 4737 BP; 718 A; 1927 C; 1472 G; 620 T; 0 other; 

Query Match 9.5%; Score 39; DB 21; Length 4737; 
Best Local Similarity 47.7%; Pred. No. 0.087; 

Matches 114; Conservative 0; Mismatches 125; Indels 0; Gaps 



Qy 



105 ccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaacccaggcccg 164 



Db 4 313 ccaaatcacccaagccctcacccacataccacaacccctcaccggcatcttccacaccgc 4 372 

Qy 165 tctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgctcgcaatcc 224 

I I I I I I I I I I I I I I I I I I I I I I III 

Db 4 373 cgccaccctcgacgacgccaccctcaccaacctcaccccccaacacctcaccaccaccct 4 4 32 

Qy 225 catcaccatgacccctcacgcctggcgcgccgccgacctctccaagaaagtcgtgaagac 284 

I III II I I I I I I I I I I I I I I I I I I I I II 

Db 4 4 33 ccaacccaaagccgacgccgcctggcacctccaccaccacacccaaaaccaacccctcac 4 4 92 

Qy 285 aagcactgtcttcttccccttctatgcaggtatccttggatggccagtcgcagccgcct 343 

I I I I I I I I I I I III I I I I I I I I I I I I I I I II 
Db 4 4 93 ccacttcgtcctctactccagcgccgccgccaccctcggcagccccggccaagccaact 4 551 



AAA14667; 



08-AUG-2000 (first entry) 

Nucleotide sequence of modified FK-520 PKS gene cluster module 8. 

FK-520; polyketide synthase; PKS; gene cluster; immunosuppressant; 
immunophilin; FK-506 binding protein; polyketide compound; uveitis; 
transplant rejection; graf t-versus-host disease; alopecia universalis; 
autoimmune chronic active hepatitis; inflammatory bowel disease; 
multiple sclerosis; primary biliary cirrhosis; scleroderma; 
neurite outgrowth; nerve regrowth; Parkinson's disease; 
Alzheimer's disease; stroke; traumatic spinal cord; brain injury; 
peripheral neuropathy; ss. 



RESULT 5 
AAA14667 

ID AAA14667 standard; 
XX 
AC 
XX 
DT 
XX 
DE 
XX 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
XX 
OS 
OS 
XX 
FH 
FT 
FT 
FT 
XX 
PN 
XX 
PD 
XX 
PF 
XX 
PR 
PR 
PR 
XX 
PA 
XX 

PI 



DNA; 4 7 67 BP. 



Synthetic . 
Streptomyces 

Key 
CDS 



hygroscopicus . 

Location/Qualifiers 
3. .4766 
/*tag= a 

/note= "no termination 'codon given' 



WO200020601-A2. 
13-APR-2000. 

01-OCT-1999; 99WO-US2288 6 . 



02-OCT-1998; 
ll-MAR-1999; 
17-JUN-1999; 



98US-0102748. 
99US-0123810. 
99US-0139650. 



(KOSA-) KOSAN BIOSCIENCES INC. 

Reeves C, Chu D, Khosla C, Santi D, Wu K; 



DR WPI; 2000-317716/27. 

DR P-PSDB; AAY84731. 
XX 

PT New isolated polyketide synthase nucleic acid and polyketide compounds, 

PT useful for treating e.g. transplant rejection, uveitis, multiple 

PT sclerosis, Alzheimer's disease, Parkinson's disease, stroke, or 

PT peripheral neuropathy 

XX 

PS Example 2; Page 96-99; 126pp; English. 
XX 

CC The present sequence represents module 8 of the FK-520 polyketide 

CC synthase (PKS) gene cluster, containing the acyltransf erase (AT) 

CC domain of module 13 of rapamycin. FK-506 is a potent immunosuppressant, 

CC and acts through intial formation of an intermediate' complex with 

CC protein immunophilins known as FK-506 binding proteins. The nucleic 

CC acids are used for producing polyketide compounds. The polyketide 

CC compounds can be used as immunosuppressants to prevent or treat 

CC transplant rejection, graf t-versus-host disease or uveitis. They can 

CC also be used for treating e.g. alopecia universalis, autoimmune 

CC chronic active hepatitis, inflammatory bowel disease, multiple 

CC sclerosis, primary biliary cirrhosis, or scleroderma. They 

CC also have neurotrophic activity and can be used to promote neurite 

CC outgrowth in NGF-treated PC12 cells and in sensory neuronal cultures, 

CC and in intact animals, they promote regrowth of damaged facial and 

CC sciatic nerves, and repair lesioned serotonin and dopamine neurons in 

CC the brain. They can also be used for treating e.g. Parkinson's disease, 

CC Alzheimer's disease, stroke, traumatic spinal cord and brain injury, or 

CC peripheral neuropathies. They can also be used in agricultural and 

CC veterinary applications. 

XX 

SQ Sequence 4767 BP; 731 A; 1945 C; 1468 G; 623 T; 0 other; 



Query Match 9.5%; Score 39; DB 21; Length 4767; 

Best Local Similarity 47.7%; Pred. No. 0.087; 

Matches 114; Conservative 0; Mismatches 125; Indels 0; Gaps 0; 



Qy 105 ccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaacccaggcccg 164 

I I I I I I I I I I I I I I I I I I I I II I I I III II 

Db 4343 ccaaatcacccaagccctcacccacataccacaacccctcaccggcatcttccacaccgc 4402 

Qy 165 tctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgctcgcaatcc 224 

II I I I I I I I II I I I I I I I I I I I II I 

Db 4403 cgccaccctcgacgacgccaccctcaccaacctcaccccccaacacctcaccaccaccct 4462 

Qy 225 catcaccatgacccctcacgcctggcgcgccgccgacctctccaagaaagtcgtgaagac 284 

I III II I I I I I I I I I I I III I II I II II 

Db 4463 ccaacccaaagccgacgccgcctggcacctccaccaccacacccaaaaccaacccctcac 4522 

Qy 285 aagcactgtcttcttccccttctatgcaggtatccttggatggccagtcgcagccgcct 343 

I I I I M I I II I III I I I I I I I I I I I I I I I I I 
Db 4523 ccacttcgtcctctactccagcgccgccgccaccctcggcagccccggccaagccaact 4581 



RESULT 6 
AAA14669 

ID AAA14669 standard; DNA; 4818 BP. 



XX 

AC AAA14669; 
XX 

DT 08-AUG-2000 (first entry) 
XX 

DE Nucleotide sequence of modified FK-520 PKS gene cluster module 8. 
XX 

KW FK-520; polyketide synthase; PKS; gene cluster; immunosuppressant; 

KW immunophilin; FK-506 binding protein; polyketide compound; uveitis; 

KW transplant rejection; graf t-versus-host disease; alopecia universalis; 

KW autoimmune chronic active hepatitis; inflammatory bowel disease; 

KW multiple sclerosis; primary biliary cirrhosis; scleroderma; 

KW neurite outgrowth; nerve regrowth; Parkinson's disease; 

KW Alzheimer's disease; stroke; traumatic spinal cord; brain injury; 

KW peripheral neuropathy; ss. 

XX 

OS Synthetic. 

OS Streptomyces hygroscopicus . 
XX 

FH Key Location/Qualifiers 

FT CDS ' 3.. 4817 

FT /*tag= a 

FT /note= "no termination codon given" 
XX 

PN WO200020601-A2. 
XX 

PD 13-APR-2000. 
XX 

PF 01-OCT-1999; 99WO-US2288 6 . 
XX 

PR 02-OCT-1998; 98US-01027 4 8 . 

PR ll-MAR-1999; 99US-01238 10 . 

PR 17-JUN-1999; 99US-0139650 . 
XX 

PA (KOSA-) KOSAN BIOSCIENCES INC. 
XX 

PI Reeves C, Chu D, Khosla C, Santi D, Wu K; 
XX 

DR WPI; 2000-317716/27. 

DR P-PSDB; AAY84733. 
XX 

PT New isolated polyketide synthase nucleic acid and polyketide compounds, 

PT useful for treating e.g. transplant rejection, uveitis, multiple 

PT sclerosis, Alzheimer's disease, Parkinson's disease, stroke, or 

PT peripheral neuropathy 

XX 

PS Example 2; Page 102-105; 126pp; English. 
XX 

CC The present sequence represents module 8 of the FK-520 polyketide 

CC synthase (PKS) gene cluster, containing ' the acyltransf erase (AT) 

CC domain of module 13 of rapamycin. FK-506 is a potent immunosuppressant, 

CC and acts through intial formation of an intermediate complex with 

CC protein immunophilihs known as FK-506 binding proteins. The nucleic 

CC acids are used for producing polyketide compounds. The polyketide 

CC compounds can be used as immunosuppressants to prevent or treat 

CC transplant rejection, graf t-versus-host disease or uveitis. They can 

CC also be used for treating e.g. alopecia universalis, autoimmune 



CC chronic active hepatitis, inflammatory bowel disease, multiple 

CC sclerosis, primary biliary cirrhosis, or scleroderma. They 

CC also have neurotrophic activity and can be used to promote neurite 

CC outgrowth in NGF-treated PC12 cells and in sensory neuronal cultures, 

CC and in intact animals, they promote regrowth of damaged facial and 

CC sciatic nerves, and repair lesioned serotonin and dopamine neurons in 

CC the brain. They can also be used for treating e.g. Parkinson's disease 

CC Alzheimer's disease, stroke, traumatic spinal cord and brain injury, o 

CC peripheral neuropathies. They can also be used in agricultural and 

CC veterinary applications. 

XX 

SQ Sequence 4818 BP; 742 A; 1982 C; 1476 G; 618 T; 0 other; 

Query Match 9.5%; Score 39; DB 21; Length 4818; 

Best Local Similarity 47.7%; Pred. No. 0.087; 

Matches 114; Conservative 0; Mismatches 125; Indels 0; Gaps 

Qy 105 ccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaacccaggcccg 164 

I I I I I I I I I I I I III II I J I II I I I III II 

Db 4394 ccaaatcacccaagccctcacccacataccacaacccctcaccggcatcttccacaccgc 4453 

Qy 165 tctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgctcgcaatcc 224 

II I I I I I I I I I I I I I I I I I I I I III 

Db 4454 cgccaccctcgacgacgccaccctcaccaacctcaccccccaacacctcaccaccaccct 4513 

Qy 225 catcaccatgacccctcacgcctggcgcgccgccgacctctccaagaaagtcgtgaagac 284 

I III II I I I I I I I I I I I I I I I I I I I I II 

Db 4514 ccaacccaaagccgacgccgcctggcacctccaccaccacacccaaaaccaacccctcac 4573 

Qy 285 aagcactgtcttcttccccttctatgcaggtatccttggatggccagtcgcagccgcct 343 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I II 
Db 4574 ccacttcgtcctctactccagcgccgccgccaccctcggcagccccggccaagccaact 4632 



RESULT 7 
AAH52032 

ID AAH52032 standard; DNA ; 432 BP. f 
XX 

AC AAH52032; 
XX 

DT 04-SEP-2001 (first entry) 
XX 

DE Mycobacterium tuberculosis potential drug target gene SEQ ID 86, 
XX 

KW Drug target; growth; organism viability; characterisation; ds . 
XX 

OS Mycobacterium tuberculosis. 
XX 

PN WO200135317-A1 . 
XX 

PD 17-MAY-2001 : 
XX 

PF 13-NOV-2000; 2000WO-US31152 . 
XX 

PR 12-NOV-1999; 99US-01 6508 6 . 

PR 12-NOV-1999; 99US-01 65124 . 



PR 01-FEB-2000; 2000US-0179531 . 
XX 

PA (REGC ) UNIV CALIFORNIA. 
XX 

PI Eisenberg D, Rotstein SH, Marcotte EM; 
XX 

DR WPI; 2001-329193/34. 

DR P-PSDB; AAG81181. 
XX 

PT Identifying nucleotide or polypeptide sequence for use as drug target, 

PT involves providing algorithm that analyzes a functional relationship 

PT between nucleotide or polypeptide sequences, and comparing the 

PT sequences 
XX 

PS Disclosure; Page 105; 207pp; English. 
XX 

CC This invention relates to a method for identifying a nucleotide or 

CC polypeptide sequence that may be a drug target, or essential for growth 

CC or viability of an organism. Polynucleotide sequences AAH51947 - AAH52092 

CC. represent DNA encoding proteins AAG81096 - AAG81241, Mycobacterium 

CC tuberculosis proteins which are potential drug targets. The DNA and 

CC protein sequences are used to illustrate the method of the invention. The 

CC method involves providing an unknown nucleotide or polypeptide sequences, 

CC and comparing it to a number of sequences along with at least one 

CC algorithm capable of analysing a functional relationship between 

CC nucleotide and polypeptide sequences. The method is useful for 

CC characterising the function of nucleic acids and polypeptides that may.be 

CC useful as a target for a drug or essential for the growth or viability of 

CC an organism. 

XX 

SQ Sequence 432 BP; 94 A; 139 C; 130 G; 69 T; 0 other; 



Query Match 9.0%; Score 37; DB 22; , Length 432; 

Best Local Similarity 53.0%; Pred. No. 0.14; 

Matches 79; Conservative 0; Mismatches 70; Indels 0; Gaps 0; 

Qy 155 cccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcg 214 

MINI I I I I I I I I I I I I I I I I 111 I I I I I I 

Db 188 ccgaggcgagagcgttcctacgtaatctcgccgccggtaccgacgaacagcatcccgaca 24 7 

Qy 215 ctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctccaagaaag 274 

II I I I I I I I I I I I II I I I I I Ml I I I I I I I I I I 

Db 248 gtcaaggccggatcaccttgtcggccgaccaccgccgctacgcaagcctttccaaggact 307 

Qy 275 tcgtgaagacaagcactgtcttcttcccc 303 

III M I I I I I I II II 
Db 308 gtgtggtgatcggcgcggtcgactatctc 336 



RESULT 8 
AAX87629/C 

ID AAX87629 standard; DNA; 4932 BP. 
XX 

AC AAX87 62 9; 
XX 

DT 26-OCT-1999 (first entry) 



XX 

DE Human insulin gene. 
XX 

KW Insulin; preproinsulin; PPINS; human; epitope; autoantigen; 

KW autoantibody; insulin-dependent diabetes mellitus; IDDM; 

KW immunoassay; diagmosis; ss. 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 

FT CDS 2364.. 3482 

FT ' /*tag= a 

FT /note= "contains an intron, exonic nucleotides 

FT 2424-2610 and 3397-3539 are included in 

FT the claimed cDNA of Claim 8" 

FT intron 2551.. 3336 

FT /*tag= b 

XX 

PN EP940470-A2. 
XX 

PD 08-SEP-1999. 
XX 

PF 29-DEC-1998; 98EP-066014 9 . 
XX 

PR 29-JAN-1998; 98US-0015399 . 
XX 

PA (WALL- ) WALLAC OY . 
XX 

PI Hinkkanen A; 
XX 

DR WPI; 1999-481070/41. 

DR P-PSDB; AAY06608. 
XX 

PT New fusion protein, useful for diagnosing insulin-dependent diabetes 

PT mellitus 

XX 

PS Claim 8; Page 19-22; 27pp; English. 
XX 

CC This is the nucleotide sequence of the human insulin gene coding 

CC for preproinsulin (PPINS, see AAY06608) . The invention relates to a 

CC fusion protein having epitopes of at least 2 of the autoantigens 

CC glutamate decarboxylase (GAD65, see AAY06607), islet cell antigen 
CC " (IA2, see AAY06606) and PPINS, in which the epitopes are connected 

CC via a linker peptide. The invention also provides cDNA encoding 

CC the fusion protein, which includes nucleotides 2424-2610 and 

CC 3397-3539 of the present sequence, a vector and an Escherichia coli 

CC cell encompassing the cDNA. The fusion protein is used in an 

CC immunoassay for the simultaneous detection of autoantibodies 

CC related to insulin-dependent diabetes mellitus (IDDM) . Up to 3 

CC autoantibodies may be detected at once using the immunoassay. The 

CC presence of autoantibodies against multiple autoantigens is rare 

CC but is a strong indication of the (imminent) onset of IDDM, whereas 

CC the presence of autoantibodies to just one of the autoantigens may 

CC occur in healthy individuals. 
XX 

SQ Sequence 4932 BP; 835 A; 1531 C; 1736 G; 830 T; 0 other; 



Query Match 8.7%; Score 35.8; DB 20; Length 4932; 

Best Local Similarity 49.2%; Pred. No. 0.81; 

Matches 94; Conservative 0; Mismatches 97; Indels 0; Gaps 

Qy 81 taccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccaccacaac 140 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 1565 TCCCCACACCCCTGTCCCCAGACCCCTGTCCCCACACCCCTGTCCCCACACCCCTGTCCC 150 6 

Qy 141 aatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtaga 200 

III I I I I I I I I I I I I I I I I I III 

Db 1505 CAGACCCCTGTCCCCACACCCCTGTCCCCGGACCCCTGTCCCCACACCCCTGTCCCCAGA 1446 

Qy 201 gccccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccga 260 

III III III I I I I I I I I I I I I I I I I I I I I III 

Db 14 4 5 CCCCTGTCCCCACACCCCTGTCCCCACACCCCTGTCCCGAGACCCCTGTCCCCACACCCC 138 6 

Qy 261 cctctccaaga 271 

II III II 
Db 1385 TGTCCCCAGGA 1375 



RESULT 9 
AAH00213 

ID AAH00213 standard; DNA; 839 BP. 
XX 

AC AAH00213; 
XX 

DT 24-JUL-2001 (first entry) 
XX 

DE Bifidobacterium longum nucleotide sequence SEQ ID NO: 204. 
XX 

KW Species specific; genus specific; family specific; probe; detection; 

KW identification; algal; archaeal; bacterial; fungal; parasitical; 

KW microorganism; diagnosis; translation elongation factor Tu; toxin; 

KW translation elongation factor G; RecA recombinase; resistance; 

KW catalytic subunit of proton-translocating ATPase; antimicrobial; 

KW vaccine; primer; ds . 
XX 

OS Bifidobacterium longum. 
XX 

PN WO200123604-A2. 
XX 

PD 05-APR-2001. 
XX 

PF 28-SEP-2000; 2000WO-CA01 150 . 
XX 

PR 28-SEP-1999; 99CA-22834 58 . 

PR 19-MAY-2000; 2000CA-2307010 . 
XX 

PA (INFE-) INFECTIO DIAGNOSTIC (IDI) INC. 
XX 

PI Bergeron MG, Boissinot M, Huletsky A, Menard C, Ouellette M; 

PI Picard FJ, Roy PH; 

XX 

DR WPI; 2001-245006/25. 
XX 



PT Nucleic acid sequences are used to generate universal probes and 

PT primers which can be used to identify and detect the presence of algal, 

PT archaeal, bacterial, fungal and parasitical species in a test sample - 

XX 

PS Claim 24; Page 527; 1580pp; English. 
XX 

CC The present invention describes a method for generating a repertory of 

CC nucleic acids of tuf, fus, atpD and/or recA genes from which probes 

CC and/or primers are derived. The method comprises amplifying the nucleic 

CC acids of determined algal, archaeal, bacterial, fungal and parasitical 

CC species with a combination of defined primer pairs. The method can be 

CC used for producing probes and/or primers for detecting one or more 

CC related microorganisms e.g. algae, archaea, bacteria, fungi and 

CC parasites, for universal detection and for specific and ubiquitous 

CC detection and identification of an algal, archaeal, .bacterial, fungal 

CC and parasitical species, genus, family and group. A nucleic acid (I) 

CC obtained using the method of the invention can be used for the universal 

CC detection of any bacterium, fungus or parasite in a sample and for the 

CC detection of at least one antimicrobial agent resistance gene or at 

CC least one toxin gene. hexA nucleic acids are used for the specific and 

CC ubiquitous detection and for identification of Streptococcus pneumoniae. 

CC (I) can be used to design a therapeutic agent which is effective against 

CC microorganisms. Microbial species or genus or family or phylum or group 

CC which can be detected include Abiotrophia adiacens, Bordetella sp., 

CC Corynebacterium sp., Enterobacteriaceae group, Escherichia coli, 

CC Mycobacteriaceae family, Pseudomonads group, Streptococcus sp., 

CC Neisseria gonorrhoeae and Staphylococcus sp . . Using DNA based tests 

CC provides faster results than substrate specificity tests as results can 

CC be determined in an hour and improved accuracy is also achieved. * 

CC AAH00010 to AAH002304 represent nucleotide sequences and primers/probes 

CC which are given in the exemplification of the present invention. 

XX 

SQ Sequence 839 BP; 152 A; 291 C; 241 G; 155 T; 0 other; 



Query Match 8.7%; Score 35.6; DB 22; Length 839; 

Best Local Similarity 48.1%; Pred. No. 0.47; 

Matches 101; Conservative 0; Mismatches 109; Indels 0; Gaps . 0; 

Qy 98 caacactccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaaccc 157 

I II II I I I I I I I I I I I I I I I I I II I 

Db 471 ccaccgtcacctccatcgagaccttccacaagaccatggacgcctgcgaggctggcgaca 530 

Qy 158 aggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgctc 217 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 531 acaccggtctgcttctgcgtggtctcggccgtgacgatgtcgagcgtggccaggttgtgg 590 

Qy 218 gcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctccaagaaagtcg 277 

Mill I I I I I I I I I I I I I I I II I III I I I I I I 
Db 591 ccaagccgggctccgtcaccccgcacaccaagttcgagggcgaagtctacgtgctgacca 650 

Qy 278 tgaagacaagcactgtcttcttccccttct 307 

I I I I I I I I I I II I 

Db 651 aggacgaaggcggccgtcactcgccgttct 680 



RESULT 10 



AAA92302 

ID AAA92302- standard; DNA; 31422 BP. 
XX 

AC AAA92302; 
XX 

DT 10-JAN-2001 (first entry) 
XX 

DE S. avermitilis avermectin aglycon synthase DNA aveAII SEQ ID NO: 2. 
XX 

KW Streptomyces avermitilis; avermectin aglycon synthase; biosynthesis; 

KW multifunctional enzyme; polyketide; avermectin; veterinary drug; 

KW agrochemical; ds . 
XX 

* OS Streptomyces avermitilis. 
XX 

FH Key Location/Qualifiers 

FT CDS 1.. 14646 

FT /*tag- a 

FT /note= "avermectin aglycon synthase protein" 

FT CDS 14824.. 31422 

FT */*tag= b 

FT /note= "avermectin aglycon synthase protein" 
XX 

PN WO200050605-A1. 
XX 

PD 31-AUG-2000. 
XX 

PF 23-FEB-2000; 2000WO- JP0104 1 . 
XX 

PR 24-FEB-1999; 99JP-004 6961 . 
XX 

PA (KITA ) KITASATO INST. 
XX 

PI Omura S, Ikeda H; 
XX 

DR WPI; 2000-565458/52. 

DR ■ P-PSDB; AAB23751, AAB23752. 

XX 

PT Avermectin aglycone synthase DNA and proteins encoded by all or part of 

PT it for the production of avermectin and its derivatives for drug and 

PT agrochemical use 
XX 

PS Claim 2; Page 134-203; 314pp; Japanese. 
XX 

CC The present sequence represents DNA which encodes avermectin ■ aglycon 

CC synthase proteins. Also described are: (1) polypeptides encoded by all 

CC or part of the DNA; (2) expression vectors containing the DNA; (3) host 

CC cells transformed by the vectors; (4) preparation of the polypeptides 

CC by culture of the transf ormants ; (5) preparation of avermectin aglycon 

CC or its derivatives by culture of transformed avermectin-producing 

CC microorganisms; and (6) oligonucleotides of 5-60 bases in length 

CC containing sense or antisense sequences from the avermectin aglycon 

CC synthase DNA. The enzymes are useful for the production of modified 

CC forms of avermectin and of the intermediates in its biosynthesis, for 

CC use as drugs, veterinary drugs and agrochemicals . 
XX 

SQ Sequence 31422 BP; 4136 A; 10238 C; 11677 G; 5371 T; 0 other; 



Query Match 8.7%; Score 35.6; DB 21; 

Best Local Similarity 49.5%; Pred. No. 1.9; 
Matches 92; Conservative 0; Mismatches 94; 



Length 31422; 

Indels 0; Gaps 0; 



Qy 76 agcgataccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccacc 135 

I I I I i I I I I I I II III I I I I I I I I I I I I I I M I I 

Db 28258' atcgaaatcagtccccaccccaccctcgtccccgccatcgaagacaccaccgaaaacacc 28317 

Qy 136 acaacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatg 195 

II II I I I I I I I I I I I I I I I I I I I I I 

Db 28318 accgaaaacatcaccgcgaccggcagcctccgccgcggcgacaacgacacccaccgcttc 28377 

Qy 196 gtagagccccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgcc 255 

I I I I I I I I I I I I II II I I I I I I I I I I I I I I 

Db 28378 ctcaccgccctcgcccacacccacaccaccggcattcggacacccaccacctggcaccac 28437 

Qy 256 gccgac 261 
I I I 

Db 28438 cactac 28443 



RESULT 11 
AAZ58381 

ID AAZ58381 standard; DNA; 12381 BP. 
XX 

AC AAZ58381; 
XX 

DT 23-MAY-2000 (first entry) 
XX 

DE Streptomyces avermitilis avermectin polyketide synthase modules 1+2. 
XX 

KW Polyketide synthase; avermectin; insecticide; ss. 
XX 

OS Streptomyces avermitilis. 
XX 

PN WO200001827-A2. 
XX 

PD 13-JAN-2000. 
XX 

PF 06-JUL-1999; 99WO-GB02158 . 
XX 

PR 06-JUL-1998;. 98GB-0014622 . 
XX 

PA (BIOT-) BIOTICA TECHNOLOGY LTD. 

PA (PFIZ ) PFIZER INC. 

XX 

PI Kellenberger JL, Leadlay PF, Staunton J, Stut zman-Engwall KJ; 

PI McArthur HAI; 

XX 

DR WPI; 2000-182117/16. 
XX 

PT Mutated Type I polyketide synthase containing a polylinker site in an 

PT extension module for replacement of a reductive loop sequence, for 

PT producing polyketides, e.g. Bl avermectin - 
XX 



PS Disclosure; Fig 7a-f; 75pp; English. 
XX 

CC The present sequence is that of DNA encoding the first 2 modules 

CC of the avermectin polyketide synthase (PKS) of Streptomyces 

CC avermitilis . The invention relates to nucleic acids encoding a 

CC Type I PKS such as avermectin in which a polylinker with multiple 

CC restriction sites replaces or 1 more PKS genes encoding enzymes 

CC associated with reduction. Novel PKS are provided in which in 

CC which the reductive loop in a selected module of the Type I PKS is 

CC replaced with the equivalent segment from the same or different 

CC PKS gene cluster or by a mutated or synthetic segment. Vectors and 

CC host cells, and methods for producing novel polyketides by 

CC culturing host cells are claimed. The polyketides obtained are 

CC useful as antibiotics and insecticides. Fermentation products 

CC containing C22-C23 dihydroavermectin, ivermectin and Bl 

CC avermectins are claimed. 

XX 

SQ Sequence 12381 BP; 1884 A; 4561 C; 4005 G; 1931 T; 0 other; 



Query Match 8.6%; Score 35.4; DB 21; Length 12381; 

Best Local Similarity 49.2%; Pred. No. 1.5; 

Matches 93; Conservative 0; Mismatches 96; Indels 0; Gaps 

Qy 96 cccaacactccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaac 155 

II II I I I I II I I I I I I I I M I I I I I I I Mill 

Db 5308 cctcaccctccccaccacccaccaaccccaccaaacctggctcatcgccatccccgaaac 53 67 

Qy 156 ccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgc 215 

M I I I I I II II I III II I I I I I I I I II I 

Db 5368 ccagacccaccacccccacatcaccaacatcctcaccaacctccaccaccacggcatcac 54 27 

Qy 216 tcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctccaagaaagt 275 

III I I I I I I I I I I II I I III I I I I I III I 

Db 5428 ccccatccccctcaccctcaaccacacccacaccaacccccaacacctccaccacaccct 5487 



Qy 276 cgtgaagac 284 

I III 
Db 5488 ccaccacac 5496 



RESULT 12 


AAA92301 


ID 


AAA92301 standard; DNA; 30690 BP. 


XX 




AC 


AAA92301; 


XX 




DT 


10-JAN-2001 (first entry) 


XX 




DE 


• S. avermitilis avermectin aglycon synthase DNA aveAI SEQ ID NO:l. 


XX 




KW 


Streptomyces avermitilis; avermectin. aglycon synthase; biosynthesis; 


KW 


multifunctional enzyme; polyketide; avermectin; veterinary drug; 


KW 


agrochemical; ds . 


XX 




OS 


Streptomyces avermitilis. 


XX 





FH Key - Location/Qualifiers 

FT CDS 1.. 11919 

FT /*tag= a 

FT /note= "avermectin aglycon synthase protein" 

FT CDS 11971. .30690 

FT /*tag= b 

FT /note= "avermectin aglycon synthase protein" 



XX 

PN WO200050605-A1 . 
XX 

PD 31-AUG-2000. 
XX 

PF 23-FEB-2000; 2000WO- JP01041 . 
XX 

PR 24-FEB-1999; 99 JP-004 6961 . 
XX 

PA (KITA ) KITASATO INST. 
XX 

PI Omura S, Ikeda H; 
XX 

DR WPI; 2000-565458/52.. 

DR P-PSDB; AAB23749, AAB23750. 

XX 

PT Avermectin aglycone synthase DNA and proteins encoded by all or part of 

PT it for the production of avermectin and its derivatives for drug and 

PT agrochemical use 
XX 

PS Claim 2; Page 66-134; 314pp;- Japanese. 
XX 

CC The present sequence represents DNA which encodes avermectin aglycon 

CC synthase proteins. Also described are: (1) polypeptides encoded by all 

CC or part of the DNA; (2) expression vectors containing the DNA; (3) host 

CC cells transformed by the vectors; (4) preparation of the polypeptides 

CC by culture of the transf ormants ; (5) preparation of avermectin aglycon 

CC or its derivatives by culture of transformed avermectin-producing 

CC microorganisms; and (6) oligonucleotides of 5-60 bases in length 

CC containing sense or antisense sequences from the avermectin aglycon 
CC * synthase DNA. The .enzymes are useful for the production of modified 

CC forms of avermectin and of the intermediates in its biosynthesis, for 

CC use as drugs, veterinary drugs and agrochemicals . 
XX 

SQ Sequence 30690 BP; 5356 A; 12454 C; 8617 G; 4263 T; 0 other; 



Query Match 8.6%; Score 35.4; DB 21; Length 30690; 

Best Local Similarity 49.2%; Pred. No. 2.2; 

Matches 93; Conservative 0; Mismatches 96; Indels 0; Gaps 

Qy 96 cccaacactccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaac 155 

II I I I I I I I I I I I I I I I I I I I I I I I I I Mill 

Db 4491 cctcaccctccccaccacccaccaaccccaccaaacctggctcatcgccatccccgaaac 4550 

Qy 156 ccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgc 215 

I I I I I I I II II I III II I I I I I I I T I I I ■ 

Db 4551 ccagacccaccacccccacatcaccaacatcctcaccaacctccaccaccacggcatcac 4610 



Qy 



216 tcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctccaagaaagt 275 



Ill 1 1 1 1 1 1 1 1 I I 1 1 I I III 1 1 1 II III I 

Db 4 611 ccccatccccctcaccctcaaccacacccacaccaacccccaacacctccaccacaccct 4 67 0 

Qy 276 cgtgaagac 284 

I I I I 
Db 4671 ccaccacac 4679 



RESULT 13 
AAX53491/C 

ID AAX53491 standard; DNA; 114955 BP. 
XX . 

AC AAX534 91; 
XX 

DT 05-JUL-1999 (first entry) 
XX 

DE Human adenosine Al receptor antisense oligonucleotide fragment. 
XX 

KW Antisense oligonucleotide; multiple target; antisense treatment; 

KW impaired respiration; inflammation; lung disease; 

KW pulmonary vasoconstriction; inflammation; allergic rhinitis; 

KW acute asthma; allergy; asthma; impeded respiration; 

KW respiratory distress syndrome; pain; cystic fibrosis; 

KW pulmonary hypertension; pulmonary vasoconstriction; emphysema; 

KW chronic obstructive pulmonary disease; leukemia; lymphoma ;' carcinoma ; 

KW colon cancer; breast cancer; lung cancer; pancreatic cancer; 

KW hepatocellular carcinoma; kidney cancer; melanoma; hepatic metastasis; 

KW prostate cancer; ss. 

XX 

OS Synthetic. 
XX 

PN W09913886-A1.- 
XX 

PD 25-MAR-1999. 
XX 

PF 17-SEP-1998; 98WO-US1 94 19 . 
XX 

PR 09-JUN-1998; 98US-00 93972 . 

PR 17-SEP-1997; 97US-00591 60 . 
XX 

PA (UYEC-) UNIV EAST CAROLINA. 
XX 

PI Nyce JW; 
XX 

DR WPI; 1999-229400/19. 
XX 

PT New antisense oligonucleotides used in treatment of, e.g. pulmonary 

PT vasoconstriction 

XX 

PS Disclosure; Page 37; 120pp; English. 
XX 

CC The specification describes antisense oligonucleotides (AAX52869-X55271 ) 

CC directed against at least 2 mRNAs selected from target genes, coding and 

CC non-coding regions of RNAs corresponding to target genes, gene 

CC initiation codons, genomic flanking regions, intron-exon borders, the 

CC 5 '-end, the 3 f -end and the juxta-section between coding and non-coding 

CC regions and all segments of. RNAs encoding proteins associated with one 



CC or more diseases, conditions or mixtures. The antisense oligonucleotides 

CC may be derived from sequences AAX55272-74 . These multiple target 

CC oligonucleotides (specifically AAX55180-271 ) can be used for the 

CC antisense treatment of diseases and conditions. Typical diseases and 

CC conditions are those associated with impaired respiration and 

CC inflammation, including lung diseases, pulmonary vasoconstriction, 

CC inflammation, allergic rhinitis, acute asthma, allergies, asthma, impeded 

CC respiration, respiratory distress syndrome, pain, cystic fibrosis, 

CC pulmonary hypertension, pulmonary vasoconstriction, emphysema, chronic 

CC obstructive pulmonary disease (COPD) , and cancers such as leukemias, 

CC lymphomas, carcinomas e.g. colon cancer, breast cancer, lung cancer, 

CC pancreatic cancer, hepatocellular carcinoma, kidney cancer, melanoma, 

CC hepatic metastases, as well as all types of cancers which may metastasize 

CC or have metastasized to the lungs, including breast and prostate cancer. 

XX 

SQ Sequence 114955 BP; 6071 A; 29417 C; 36712 G; 21328 T; 21427 other; 

Query Match 8.5%; Score 34.8; DB 20; Length 114955; 

Best Local Similarity 43.5%; Pred. No. 5.5; 

Matches 74; Conservative 7; Mismatches 89; Indels 0; Gaps 0; 

Qy 173 ggcgtagcgtcgcccgcggcatggtagagccccaccccttcgctcgcaatcccatcacca 232 

I I I I II I I I I I I : : I I I I III I III I : 

Db 108267 GGCGCCGCCGCCCCCGCCNNHNNNSCCVAGGCGAGCCAGGCGCCGCCGCCCCCGCCNNHN 108208 

Qy 233 tgacccctcacgcctggcgcgccgccgacctctccaagaaagtcgtgaagacaagcactg 292 

III: II 11111111111111 : I : I I I II 

Db 108207 NNSCCCVAGGCGAGCCAGGCGCCGCCGCCCCCGCCNNHNNNSGCCCVAGGCGAGCCAGGC 108148 

Qy 293 tcttcttccccttctatgcaggtatccttggatggccagtcgcagccgcc 342 

I I I I I I I : II II I I I I I I I I I I I I I I 

Db 108147 GCCGCCGCCCCCGCCNNHNNNSGGCCCVAGGCGAGCCAGGCGCCGCCGCC 108098 



RESULT 14 
AAV22334 

ID AAV22334 standard; DNA; 2634 BP. 

xX 

AC AAV22334; 
XX 

DT _ 17-AUG-1998 (first entry) 
XX 

DE Microbispora thermorosea pyruvate orthophosphate dikinase gene. 
XX 

KW Pyruvate orthophosphate dikinase; PPDK; pyrophosphoric acid; 

KW assay; ds . 

XX 

OS Microbispora thermorosea strain IFO 14047. 
XX 

PN GB2317892-A. 
XX 

PD 08-APR-1998. 
XX 

PF 02-OCT-1997; 97GB-0021083 . 
XX 

PR 03-OCT-1996; 96 JP-0281304 . 



XX 

PA (KIKK ) KIKKOMAN CORP. 
XX 

PI Eisaki N, Horiuchi T, Nagahara A, Tatsumi H; 
XX 

DR WPI; 1998-171634/16. 

DR P-PSDB; AAW56116. 
XX 

PT New gene encoding pyruvate orthophosphate dikinase, useful for 

PT recombinant production of enzyme - of use in assaying pyrophosphoric 

PT acid in catalysing conversion of AMP to ATP 

XX 

PS Claim 4; Page 17-24; 31pp; English. 
XX 

CC This DNA sequence comprises the coding region of the 

CC pyruvate orthophosphate dikinase (PPDK) gene of Microbispora 

CC thermorosea IFO 14047. The gene codes for a 878-amino acid PPDK 

CC enzyme (see AAW56116) that catalyses a reaction for forming ATP, 

CC pyruvic acid and pyrophosphoric acid from AMP, phosphoenolpyruvic 

CC acid and pyrophosphoric acid. The PPDK gene was isolated from a 

CC genomic DNA library of M. thermorosea IFO 14047 using a probe 

CC prepared by PCR using a primer (see AAV22313) based on "the N-terminal 

CC peptide of PPDK and a primer (see AAV22314) based on a Clostridium 

CC symbiosum PPDK peptide. The gene has been amplified by PCR (see 

CC AAV22408-09) and inserted into expression vector pUTE500K to give 

CC plasmid pPDK35. Isolation of the gene allows production of PPDK in 

CC large amounts by culture of host cells, especially Escherichia coli 

CC TGI (pPDK35) carrying the PPDK gene. PPDK is used in quantifiable 

CC assays of pyrophosphoric acid. The assay comprises reacting a test 

CC solution with phosphoenolpyruvic acid and AMP in the presence of 

CC PPDK. The AMP is converted by PPDK to ATP which is detected by its 

CC activation of firefly luciferase (claimed) . 

XX 

SQ Sequence 2634 BP; 407 A; 933 C; 937 G; 357 T; 0 other; 



Query Match 8.3%; Score 34.2; DB 19; Length 2634; 

Best Local Similarity 55.5%; Pred. No. 1.9; 

Matches 66; Conservative 0; Mismatches 53; Indels 0; Gaps 

Qy 180 cgtcgcccgcggcatggtagagccccaccccttcgctcgcaatcccatcaccatgacccc 239 

I I I I I I I I I I I I I I I I I I I I I III II I II III 

Db 1353 cgtggcccgcggcatgggcaagacctgcgtgtgcggggccgaggaactggaagtggaccc 1412 

Qy 240 tcacgcctggcgcgccgccgacctctccaagaaagtcgtgaagacaagcactgtcttct 298 

I I I I I I I I I I I I I I II I II I I I I I I I I II II III 

Db 1413 gcacgcccgccgcttcaccgcgcccggcgggatcgtcgtgaacgagggcgaggtgatct 1471 



RESULT 15 
AAF67759 

ID AAF67759 standard; DNA; 913 BP. 
XX 

AC AAF67759; 
XX 

DT ll-APR-2001 (first entry) 
XX 



DE Corynebacterium glutamicum MCT protein encoding DNA SEQ ID NO: 33. 
XX 

KW Corynebacterium glutamicum; brevibacterium lactof ermentum; MCT; 

KW membrane construction and membrane transport protein; petroleum spill; 

KW hydrocarbon degradation; gram positive aerobic bacterium; marker; 

KW identification; microorganism; fine chemical production; transformation; 

KW genome mapping; genetic engineering; ds . 

XX 

OS Corynebacterium glutamicum. 
XX 

PN WO200100805-A2 . 
XX 

PD 04-JAN-2001. 
XX 

PF 23-JUN-2000; 2 000WO-IB00 92 6 . 
XX 

PR 25-JUN-1999; 99US-01 4 1031 . 

PR 08-JUL-1999; 99DE-10314 54 . 

PR 08-JUL-1999; 99DE-103147 8 . 

PR 08-JUL-1999; 99DE-1031563 . 

PR 09-JUL-1999; 99DE-1032122 . 

PR 09-JUL-1999; 99DE-1032124 . 

PR 09-JUL-1999; 99DE-1032125 . 

PR 09-JUL-1999; 99DE-1032128 . 

PR 09-JUL-1999; 99DE-1032180 . 

PR 09-JUL-1999; 99DE-1032182 . 

PR 09-JUL-1999; 99DE-1032190 . 

PR 09-JUL-1999; 99DE-1032191 . 

PR 09-JUL-1999; 99DE-1032209 . 

PR 09-JUL-1999; 99DE-1032212 . 

PR 09-JUL-1999; 99DE-1032227 . 

PR 09-JUL-1999; 99DE-1032228 . 

PR 09-JUL-1999; 99DE-103222 9 . 

PR 09-JUL-1999; 99DE-1032230 . 

PR 14-JUL-1999; 99DE-1032927 . 

PR 14-JUL-1999; 99DE-1033005 . 

PR 14-JUL-1999; 99DE-1033006 . 

PR 27-AUG-1999; 99DE-1040764 . 

PR 27-AUG-1999; 99DE-1040765 . 

PR 27-AUG-1999; 99DE-10407 66 . 

PR 27-AUG-1999; 99DE-1040830 . 

PR 27-AUG-1999; 99DE-1040831 . 

PR 27-AUG-1999; 99DE-1040832 . 

PR 27-AUG-1999; 99DE-1040833 . 

PR 31-AUG-1999; 99DE-1041378 . 

PR 31-AUG-1999; 99DE-104 137 9 . 

PR 31-AUG-1999; 99DE-1041395 . 

PR 03-SEP-1999; 99DE-1042077 . 

PR 03-SEP-1999; 99DE-1042078 . 

PR 03-SEP-1999; 99DE-104207 9 . 

PR 03-SEP-1999; 99DE-1042088 . 
XX 

PA (BADI ) BASF AG. 
XX 

PI Pompejus M, Kroeger B, Schroeder H, Zelder 0, Haberhauer G; 
XX 

DR WPI; 2001-071486/08. 



DR P-PSDB; AAB76526. 
XX 

PT Corynebacterium glutamicum nucleic acids encoding membrane construction 

PT and membrane transport proteins or their portions, useful for typing or 

PT identifying C. glutamicum or related bacteria, and as markers for 

PT transformation - 
XX 

PS Claim 3; Page 185-186; 1119pp; English. 
XX 

CC AAF67743 to AAF68080 encode the Corynebacterium glutamicum membrane 

CC construction and membrane transport (MCT) proteins given in AAB76510 to 

CC AAB76847. The MCT nucleic acids and proteins are useful in the 

CC identification of microorganisms which can be used to produce fine 

CC chemicals, for modulating fine chemical production in C. glutamicum or 

CC related bacteria (e.g. Brevibacterium lactof ermentum) , the typing or . 

CC identification of C. glutamicum or related bacteria, as reference points 

CC for mapping C. glutamicum genome, and as markers for transformation. 

CC AAF68082 and AAF68082 represent sequencing primers which are used in an 

CC example from the present invention. 

XX 

SQ Sequence 913 BP; 233 A; 321 C; 206 G; 153 T; 0 other; 



Query Match 8.3%; Score 34; DB 22; Length. 913; 

Best Local Similarity 48.9%; Pred. No. 1.5; 

Matches 91; Conservative 0; Mismatches 95; Indels 0; Gaps 0; 

Qy 118 cacttcaaccaaaccaccacaacaatgccttcagtaacccaggcccgtctcatgtggcgt 177 

I II I I I I I I I I II I I I I I I I I I I I I I I I II 
Db 638 ctctccggcgcagcccccaacaccgttccttttgaaaccctgaccagcgcagcaatgggc 697 

Qy 178 agcgtcgcccgcggcatggtagagccccaccccttcgctcgcaatcccatcaccatgacc 237 

I I I I I I I I I I II I I I I I I I Mil l I I I I 

Db 698 ggcgacggcgacgacgtagtttcagaacccaccgtgaccaaagaatccgtcgcgctgatc 757 

Qy 238 cctcacgcctggcgcgccgccgacctctccaagaaagtcgtgaagacaagcactgtcttc 297 

I Mill M I I II I I I M I I I II I II I II 
Db . 758 ctctacacctccggcaccaccggacgccccaagggtgcccagctcacccacggaaacctg 817 



Qy 298 ttcccc 303 

III II 
Db* 818 ttctcc 823 



Search completed: February 7, 2002, 11:00:37 
Job time: 5023 sec 

GenCore version 4.5 
■ Copyright (c) 1993 - 2000 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on : 



February 7, 2002, 11:12:24 ; Search time 172.96 Seconds 

(without alignments) 
538.173 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 



US-09-394-745-6603 
411 

1 agcaaaagcatagagatcca 



aggagaagaggaagggaccg 411 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 

Searched: 351203 seqs, 113238999 residues 

Total number of hits satisfying chosen parameters: 702406 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Issued_Patents_NA: * 

1 : /cgn2_6/ptodata/2/ina/5A_COMB. seq: * 

2 : /cgn2_6/ptodata/2/ina/5B_COMB. seq: * 

3 : /cgn2_6/ptodata/2/ina/6A_COMB. seq: * 

4 : /cgn2_6/ptodata/2/ina/6B_COMB. seq: * 

5: /cgn2_6/ptodata/2/ina/PCTUS_COMB. seq: * 
6 : /cgn2_6/ptodata/2/ina/backf ilesl . seq: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 
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ALIGNMENTS 



RESULT 1 
US-09-007-005-17/c 

; Sequence 17, Application US/09007005B 

; Patent No. 6258558 

; GENERAL INFORMATION: 

; APPLICANT: Szostak, Jack W. 

; APPLICANT: Roberts, Richard W. 

; APPLICANT: Liu, Rihe 

; TITLE OF INVENTION: SELECTION OF PROTEINS USING RNA-PROTEIN 

; TITLE OF INVENTION: FUSIONS 

; FILE REFERENCE: 00786/350003 

; CURRENT APPLICATION NUMBER: US/0 9/007 , 005B 

; CURRENT FILING DATE: 1998-01-14 

; EARLIER APPLICATION NUMBER: 60/035,963 

/ EARLIER FILING DATE: 1997-01-27 

; EARLIER APPLICATION NUMBER: 60/064,491 

; EARLIER FILING DATE: 1997-11-06 

; NUMBER OF SEQ ID NOS : 33 

SOFTWARE: Fast SEQ for Windows Version 4.0 
; SEQ ID NO 17 

LENGTH: 28 9 

TYPE: RNA 
; ORGANISM: Artificial Sequence 

FEATURE: 

OTHER INFORMATION: Translation template 
FEATURE : 

NAME /KEY : misc_f eature 
LOCATION: (1) . . . (289) 



OTHER INFORMATION: n = A,T,C or G 
US-09-007-005-17 



Query Match 8.5%; Score 35; DB 4; Length 289; 

Best Local Similarity 7.9%; Pred. No. 0.04; 

Matches 20; Conservative 102; Mismatches 131; Indels 0; Gaps 0; 

Qy 91 cccatcccaacactccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttca 150 

: I : I : : I : : I : I I : I : I I : I : : : : : : : : : : : : 

Db 253 YCYAYAYGYAYGYTYTYAYCYGYCYAYGYCYTYGYSYNYNYSYNYNYSYNYNYSYNYNYS 194 

Qy 151 gtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccacccc 210 

Db 193 YNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYS 134 

Qy 211 ttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctccaag 27 0 

Db 133 YNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYS 7 4 

Qy 271 aaagtcgtgaagacaagcactgtcttcttcccctt'ctatgcaggtatccttggatggcca 330 



Db 73 YNYNYSYNYNYSYNYNYCYAYTYTYGYTYAYAYTYTYGYTYAYAYAYTYAYGYTYAYAYT 14 

Qy 331 gtcgcagccgcct 343 

I : I : I : : I : . 
Db 13 YTYGYTYCYCYCY 1 



RESULT 2 
US-09-244-796-17/C 

; Sequence 17, Application US/09244796 

; Patent No. 6281344 

; GENERAL INFORMATION: 

; APPLICANT: Szostak, Jack W. 

; APPLICANT: Roberts, Richard W. 

; APPLICANT: Liu, Rihe 

; TITLE OF INVENTION: SELECTION OF PROTEINS USING RNA- PROTEIN 

; TITLE OF INVENTION: FUSIONS 

; FILE REFERENCE: 00786/350007 

; CURRENT APPLICATION NUMBER: US/0 9/24 4,7 96 

; CURRENT FILING DATE: 1999-02-05 

; EARLIER APPLICATION NUMBER: 60/035,963 

; EARLIER FILING DATE: 1997-01-27 

; EARLIER APPLICATION NUMBER: 60/064,491 

; EARLIER FILING DATE: 1997-11-06 

; EARLIER APPLICATION NUMBER: 097007,005 

; EARLIER FILING DATE: 1998-01-14 

; NUMBER OF SEQ ID NOS : 33 

; SOFTWARE: FastSEQ for Windows Vers.ion 4.0 
; SEQ ID NO 17 

LENGTH: 289 

TYPE: RNA 
; ORGANISM: Artificial Sequence 

FEATURE : 

OTHER INFORMATION: Translation template 
FEATURE : 



NAME/KEY: misc_f eature 
LOCATION: (1) . . . (289) 
OTHER INFORMATION: n = A,T,C or G 
US-09-244-796-17 



Query Match 8.5%; Score 35; DB 4; Length 289; 

Best Local Similarity 7.9%; Pred. No. 0.04; 

Matches 20; Conservative 102; Mismatches 131; Indels 0; Gaps 0; 

Qy 91 cccatcccaacactccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttca 150 

: I : I : : | : : | : I I : I : I I : I : : : : : : : : : : : : 

Db 253 YCYAYAYGYAYGYTYTYAYCYGYCYAYGYCYTYGYSYNYNYSYNYNYSYNYNYSYNYNYS 194 

Qy 151 gtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccacccc 210 

Db 193 YNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYS 134 

Qy 211 ttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctccaag 27 0 

Db 133 YNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYSYNYNYS 74 

Qy 271 aaagtcgtgaagacaagcactgtcttcttccccttctatgcaggtatccttggatggcca 330 

: : : i : : : : | : : : | : : : : | : I : : : I : : 

Db 73 YNYNYSYNYNYSYNYNYCYAYTYTYGYTYAYAYTYTYGYTYAYAYAYTYAYGYTYAYAYT 14 

Qy 331 gtcgcagccgcct 343 

I : I : I : : I : 
Db 13 YTYGYTYCYCYCY 1 



RESULT 3 
US-08-232-463-14 

Sequence 14, Application US/08232463 . 
Patent No. 5670367 
GENERAL INFORMATION: 

APPLICANT: DORNER, F. 
APPLICANT: SCHEIFLINGER, F. 
APPLICANT: FALKNER, F. G. 

TITLE OF INVENTION: RECOMBINANT FOWLPOX VIRUS 
NUMBER OF SEQUENCES: 52 
CORRESPONDENCE ADDRESS: 

ADDRESSEE:. Foley & Lardner 
STREET: 1800 Diagonal Road, Suite 500 
CITY: Alexandria 
STATE: VA 
COUNTRY: USA 
ZIP : 22313-0299 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/232 , 4 63 
FILING DATE: 
CLASSIFICATION: 435 



PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US/07/935,313 
FILING DATE: 

APPLICATION NUMBER: EP 91 114 300.6 
FILING DATE: 26-AUG-1991 
ATTORNEY /AGENT INFORMATION: 
NAME: BENT, Stephen A. 
REGISTRATION NUMBER: 29,768 
REFERENCE/DOCKET NUMBER: 30472/114 IMMU 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: ( 703 ) 83 6-9300 
TELEFAX: (703)683-410 9 
TELEX: 899149 
INFORMATION FOR SEQ ID NO: 14: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 7218 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
IMMEDIATE SOURCE: 
CLONE: pTZgpt-Fls 
US-08-232-463-14 



Query Match 8.5%; Score 35; DB 1; Length 7218; 

Best Local Similarity 3.1%; Pred. No. 0.2; 

Matches 11; Conservative 192; Mismatches 152; Indels 0; Gaps 0; 

Qy 17 tccatcttctctgctcaatcaattacacaacaagagcattctagatttgagttcatccta 76 

Db 1102 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1161 

Qy 77 gcgataccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccacca 136 

Db 1162 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1221 

Qy 137 caacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatgg 196 

Db 1222 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1281 

Qy 197 tagagccccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccg 256 

Db 1282 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1341 

Qy 257 ccgacctctccaagaaagtcgtgaagacaagcactgtcttcttccccttctatgcaggta 316 

Db 1342 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1401 

Qy 317 tccttggatggccagtcgcagccgcctggtggttcaacggaaacatgtgactctt 371 

: : : : : : : : : : : : : : : : : : II I III I I II I 
Db 14 02 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYGTACCAAATTCTTCTATCTCTT 14 56 



RESULT 4 
US-08-941-936-1 

; Sequence 1, Application US/08941936 
; Patent No. 6054305 



GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Tatsumi, Hiroki 
Eisaki, Naoki 
Horiuchi, Tatsuo 
Nagahara,^ Ayumu 

Pyruvate Orthophosphate Dikinase Gene, 
Recombinant DNA, and Process For Producing Pyruvate 
Orthophosphate Dikinase 



TITLE OF INVENTION 
TITLE OF INVENTION 
TITLE OF INVENTION 
NUMBER OF SEQUENCES: 8 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: MEDLEN & CARROLL, LLP 
STREET: 220 Montgomery Street, Suite 2200 
CITY: San Francisco 
STATE : CA 
COUNTRY: US 
ZIP: 94104 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS /MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/941,936 
FILING DATE: 01-OCT-1997 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
NAME: Carroll, Peter G. 
REGISTRATION NUMBER: 32,837* 
REFERENCE/DOCKET NUMBER: HIRAKI-03009 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 415-705-8410 
TELEFAX: 415-397-8338 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 2634 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : double 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
ORIGINAL SOURCE: 

ORGANISM: * Microbispora thermorosea 
STRAIN: IFO 14047 
FEATURE: 

NAME /KEY : CDS 
LOCATION: 1..2 634 
US-08-941-936-1 



Query Match 8.3%; Score 34.2; DB 3; Length 2634; 

Best Local Similarity 55.5%; Pred. No. 0.22; 

Matches 66; Conservative 0; Mismatches 53; Indels 0; Gaps 0; 

Qy 180 cgtcgcccgcggcatggtagagccccaccccttcgctcgcaatcccatcaccatgacccc 239 

I I I I I I I I I I I I I I I I I I I I I III II I II III 

Db 1353 CGTGGCCCGCGGCATGGGCAAGACCTGCGTGTGCGGGGCCGAGGAACTGGAAGTGGACCC 1412 



Qy 240 tcacgcctggcgcgccgccgacctctccaagaaagtcgtgaagacaagcactgtcttct 298 



Db 1413 GCACGCCCGCCGCTTCACCGCGCCCGGCGGGATCGTCGTGAACGAGGGCGAGGTGATCT 1471 



RESULT 5 
US-09-060-756-4 

; Sequence 4, Application US/09060756 

; Patent No. 6183957 

; GENERAL INFORMATION: 

; APPLICANT: Cole, Stewart 

APPLICANT: Buchrieser-Brosch, Roland 
; APPLICANT: Gordon, Stephen 
; APPLICANT: Billault, Alain 

; TITLE OF INVENTION: METHOD FOR ISOLATING A POLYNUCLEOTIDE OF INTEREST FROM 
; TITLE OF INVENTION: THE GENOME OF A MYCOBACTERIUM USING A BAC-BASED DNA 
; TITLE OF INVENTION: LIBRARY APPLICATION TO THE DETECTION OF MYCOBACTERIA 
; FILE REFERENCE: 3495-0169 

; CURRENT APPLICATION NUMBER: US/09/060,756 
; CURRENT FILING DATE: 1998-04-16 
; NUMBER OF SEQ ID NOS : 743 
; SOFTWARE: Patent In Ver. 2.0 
; SEQ ID NO 4 

LENGTH: 1280 

TYPE: DNA 

; ORGANISM: Mycobacterium tuberculosis 
US-09-060-756-4 



Query Match 8.0%; Score 33; DB 4; Length 1280; 

Best Local Similarity 47.8%; Pred. No. 0.38; 



Matches 


96; Conservative 0; Mismatches 105; Indels 0; Gaps 


Qy 


160 


gcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgctcgc 
1 1 1 1 1 I MI MINI II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
ggcggtcagttatggggtagcggcggcgccggcgtcgaaggcggcgcagccttaagcgtc 


219 


Db 


787 


846 


Qy 


220 


aatcccat caeca tgacccctcacgcctggcgcgccgccgacctctccaagaaagtcgtg 

III 1 III 1 1 1 1 1 II 1 1 1 1 1 II 1 1 III 
ggcgacaccggcggggccggtggcgtcggcggcagcgccgggctgatcggcaccggcggc 


279 


Db 


847 


906 


Qy 


280 


aagacaagcactgtcttcttccccttctatgcaggtatccttggatggccagtcgcagcc 

II III II 1 1 1 1 1 II II 1 II HI 1 1 1 1 1 1 
aacggcggcaacggcggcaccggcgccaacgccggcagccccggaaccggcggcgccggc 


339 


Db 


907 


966 


Qy 


340 


gcctggtggttcaacggaaac 360 
1 II II 1 II 1 1 




Db 


967 


gggttgctgctgggccaaaac 987 





RESULT 6 
US-08-858-003-1 

; Sequence 1, Application US/08858003 

; Patent No. 6060234 

; GENERAL ' INFORMATION: 

APPLICANT: Katz, Leonard 
; APPLICANT: Stassi, Diane L. 
; APPLICANT: Summers Jr., Richard G. 



APPLICANT: Ruan, Xiaoan 
APPLICANT: Pereda-Lopez, Ana 
APPLICANT: Kakavas, Stephan J. 

TITLE OF INVENTION: NOVEL POLYKETIDE DERIVATIVES 

TITLE OF INVENTION: AND RECOMBINANT METHODS FOR MAKING SAME 

NUMBER OF SEQUENCES: 34 

CORRESPONDENCE ADDRESS: 

ADDRESSEE: Abbott Laboratories 

STREET: 100 Abbott Park Rd. 

CITY: Abbott Park 

STATE: Illinois 

COUNTRY: USA 

ZIP: 60064-3500 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Diskette 

COMPUTER: IBM Compatible 

OPERATING SYSTEM: DOS 

SOFTWARE: FastSEQ Version 2.0 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US /08 / 858 , 003 

FILING DATE: 16-MAY-1979 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 

FILING DATE: 
ATTORNEY/AGENT INFORMATION: 

NAME: Dianne Casuto 

REGISTRATION NUMBER: P-40,943 

REFERENCE/DOCKET NUMBER: 4 952. US. P2 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: ( 847 ) -938-3137 

TELEFAX: (847 ) -938-2 623 

TELEX: 

; INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 925 base pairs 
; TYPE: nucleic acid 

STRANDEDNESS: double 

TOPOLOGY: linear 
US-08-858-003-1 



Query Match 8.0%; Score 32.8; DB 3; Length 925; 

Best Local Similarity 49.4%; Pred. No. 0.37; 

Matches 85; Conservative 0; Mismatches 87; Indels 0; Gaps 0; 

Qy 96 cccaacactccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaac 155 

II I I I I I I I I I I I I I II I I I I I I I I I I III I I II 
Db 408 CCACACCCTCCAACCCCACCTCGACAACCACCACGACACCATCTCCATCGCCGCCATCAA 4 67 



Qy 156 ccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgc 215 

I III II I I II MM II I I I I I I I I 

Db 4 68 CGGCCCCCACGCCACCGTCCTCTCCGGCGACCGCACCACCCTCCACCACATCGCCACCCA 527 



Qy 216 tcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctcc 267 

I I I I III I I I I I I I I I I I II III III 

Db 528 ACTCAACACCAAACCCTTCACCACCACCCTCAACACCCTCACCCACCACCCC 57 9 



RESULT 7 
US-09-078-166-1 

Sequence 1, Application US/09078166 
Patent No. 6063561 
GENERAL INFORMATION: 

APPLICANT: Katz, Leonard 
APPLICANT: Stassi, Diane L. 
APPLICANT: Summers Jr., Richard G. 
APPLICANT: Ruan, Xiaoan 
APPLICANT: Pereda-Lopez, Ana 
APPLICANT: Kakavas, Stephan J. 

TITLE OF INVENTION : NOVEL POLYKETIDE DERIVATIVES 
TITLE OF INVENTION: AND RECOMBINANT METHODS FOR MAKING SAME 
NUMBER OF SEQUENCES: 4 4 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Abbott Laboratories 
STREET: 100 Abbott Park Rd. 
CITY: Abbott Park 
STATE: Illinois 
COUNTRY: USA 
ZIP: 60064-3500 
COMPUTER READABLE FORM: 
MEDIUM TYPE: Diskette 
COMPUTER: IBM Compatible 
OPERATING SYSTEM: DOS 
SOFTWARE: FastSEQ Version 2.0 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/09/078,166 
FILING DATE: 16-MAY-1979 
CLASSIFICATION: 
PRIOR APPLICATION DATA: 
APPLICATION NUMBER: 
FILING DATE: 
ATTORNEY/AGENT INFORMATION: 
NAME: Dianne Casuto 
REGISTRATION NUMBER: P-40, 943 
REFERENCE/DOCKET NUMBER:- 4952. US. P2 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: ( 8 4 7 ) -938-3137 
TELEFAX: { 847 ) -938-2623 
TELEX: 

INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 925 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : double 
TOPOLOGY: linear 
US-09-078-166-1 



Query Match 8.0%; Score 32.8; DB 3; Length 925; 

Best Local Similarity 49.4%; Pred. No. 0.37; 

Matches 85; Conservative 0; Mismatches 87; Indels 0; Gaps 0; 



Qy 



96 cccaacactccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaac 155 



II 1 1 1 1 1 1 1 1 1 1 1 1 1 II I 1 1 1 1 1 1 1 1 1 III I I II 

Db 4 08 CCACACCCTCCAACCCCACCTCGACAACCACCACGACACCATCTCCATCGCCGCCATCAA 4 67 

Qy 156 ccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgc 215 

I III II I I I I I I I I I I I I I I I I I I I 

Db 4 68 CGGCCCCCACGCCACCGTCCTCTCCGGCGACCGCACCACCCTCCACCACATCGCCACCCA 527 

Qy 216 tcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctcc 267 

I I I I III I I I I I I I I I I I I I I I I I I I 

Db 528 ACTCAACACCAAACCCTTCACCACCACCCTCAACACCCTCACCCACCACCCC 57 9 



RESULT 8 
US-08-997-467-1 

Sequence 1, Application US/08997467 
Patent No. 6200813 
GENERAL INFORMATION: 

APPLICANT: Katz, Leonard 
APPLICANT: Stassi, Diane L. 
APPLICANT: Summers Jr., Richard G. 
APPLICANT: Ruan, Xiaoan 
APPLICANT: Pereda-Lopez, Ana 
APPLICANT: Kakavas, Stephan J. 

TITLE OF INVENTION: NOVEL POLYKETIDE DERIVATIVES 
TITLE OF INVENTION: AND RECOMBINANT METHODS FOR MAKING SAME 
NUMBER OF SEQUENCES: 34 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Abbott Laboratories 
STREET: 100 Abbott Park Rd. 
CITY: Abbott Park 
STATE: Illinois 
COUNTRY: USA 
ZIP : 60064-3500 
COMPUTER READABLE ,FORM: 
MEDIUM TYPE: Diskette 
COMPUTER: IBM Compatible 
OPERATING SYSTEM: DOS 
SOFTWARE: FastSEQ Version 2.0. 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/997 , 4 67 
FILING DATE: 
CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 08/858,003 
FILING DATE: 16-MAY-1997 
ATTORNEY/AGENT INFORMATION: 
NAME: Dianne Casuto 
REGISTRATION NUMBER: P-40,943 
REFERENCE/DOCKET NUMBER: 4 952. US. P2 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (8 47 ) -938-3137 
TELEFAX: (847 ) -938-2623 
TELEX: 

INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 925 base pairs 
TYPE: nucleic acid 



STRANDEDNESS: double 
TOPOLOGY: linear 
US-08-997-467-1 



Query Match 8.0%; Score 32.8; DB 4; Length 925; 

Best Local Similarity 49.4%; Pred. No. 0.37; 

Matches 85; Conservative 0; Mismatches 87; Indels 0; Gaps 0; 

Qy 96 cccaacactccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaac 155 

II I I I I I I II I I I I I II I I I I I I I I I I III I I II 
Db 4 08 CCACACCCTCCAACCCCACCTCGACAACCACCACGACACCATCTCCATCGCCGCCATCAA 4 67 

Qy 156 ccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgc 215 

I III II I I I I I I I I I I I I I I I I I I I 

Db 4 68 CGGCCCCCACGCCACCGTCCTCTCCGGCGACCGCACCACCCTCCACCACATCGCCACCCA 527- 

Qy 216 tcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctcc 267 

I I I I III I I I I I I I I I I I II III I II 

Db 528 ACTCAACACCAAACCCTTCACCACCACCCTCAACACCCTCACCCACCACCCC 57 9 



RESULT 9 
US-08-764-233A-1 

Sequence 1, Application US/08764233A 
Patent No. 5716849 
GENERAL INFORMATION: 

APPLICANT: Ligon, James M. 
APPLICANT: Schupp, Thomas 
APPLICANT: Beck, James J. 
APPLICANT: Hill, Dwight S. 
APPLICANT: Neff, Snezanna 
APPLICANT: Ryals, John A. 

TITLE OF INVENTION: Genes For The Biosynthesis Of Soraphen 
NUMBER OF SEQUENCES: 10 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Ciba-Geigy Corporation 
STREET: 520 White Plains Road, P.O. Box 2005 
CITY: Tarrytown 
STATE: NY 
COUNTRY: USA 
ZIP: 10591 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/7 64 , 233A 
FILING DATE: 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/729,214 
FILING DATE: 09-OCT-1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/258,261 
FILING DATE: 08-JUN-1994 



ATTORNEY /AGENT INFORMATION: 
NAME: Meigs, J. Timothy 
REGISTRATION NUMBER: 38,241 
REFERENCE/DOCKET NUMBER: 1506/CIP6 
TELECOMMUNICATION INFORMATION: 
TELEPHONE : (919) 541-8587 
TELEFAX: (919) 541-8689 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 49377 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
ORIGINAL SOURCE: 

ORGANISM: Sorangium cellulosum 
IMMEDIATE SOURCE: 

CLONE: p98/l, pJL3, and pVKM15 
FEATURE : 

NAME/KEY : misc_f eature 
LOCATION: 383. .7 60 

OTHER INFORMATION: /product= "SorR" 

OTHER INFORMATION:' /note= "This gene encodes a protein that is highly 
homologous t 

; OTHER INFORMATION:' the reductase domains of type I PKSs such as eryA 

from 

OTHER INFORMATION: Saccharopolyspora erythraea." 
FEATURE : 

NAME/KEY: misc_feature 
LOCATION: 927 . .19874 

OTHER INFORMATION: /product= "SorA" 

OTHER INFORMATION: /note= "Gene product is highly homologous to type I 
-PKSs that 

/ OTHER INFORMATION: are known to be involved in the synthesis of 

polyketide 

OTHER INFORMATION: compounds." 
"FEATURE : 

NAME /KEY : misc_f eature 
LOCATION: 942.. 7115 

OTHER INFORMATION: ' /product^ "Module 1 of SorA" 
FEATURE: 

NAME/KEY: mis c_f eature 
LOCATION: 7203.. 12884 

OTHER INFORMATION: /product= "Module 2 of SorA" 
FEATURE: 

NAME /KEY : misc_f eature 
LOCATION: 13455.. 19616 

OTHER INFORMATION: /product= "Module 3 of SorA" 
FEATURE: 

NAME/KEY : misc_f eature 
LOCATION: 19871. .46318 
OTHER INFORMATION: /product= "SorB" 

OTHER INFORMATION: /note= "Gene product is highly homologous to type I 
PKS genes . " 

FEATURE : 

NAME/KEY : misc_f eature 
LOCATION: 19870.. 24556 



OTHER INFORMATION : /product= "Module 1 of SorB" 
FEATURE : 

NAME /KEY : misc_f eature 
LOCATION: 24638.. 30820 

OTHER INFORMATION: /product= "Module 2 of SorB" 
FEATURE: 

NAME /KEY : misc_f eature 
LOCATION: 30881.. 35446 

OTHER INFORMATION: /product^ "Module 3 of SorB" 
FEATURE: 

NAME /KEY : misc_f eature 
LOCATION: 35528. .40114 

OTHER INFORMATION: /product= "Module 4 of SorB" 
FEATURE : 

NAME/KEY : misc_f eature 
LOCATION: 40190. .46318 

OTHER INFORMATION: /product= "Module 5 of SorB" 
FEATURE : 

NAME /KEY : misc_f eature 
LOCATION: 4 6851 . . 47891 
OTHER INFORMATION: /product= "SorM" 

OTHER INFORMATION: /note= "The protein encoded by the sorM gene is 
highly 

; OTHER INFORMATION: homologous to the methyltransf erase from Streptomyces 

; OTHER INFORMATION: hygroscopicus that is involved in the synthesis of 

the 

OTHER INFORMATION: polyketide rappamicin . " 
US-08-764-233A-1 



Query Match 7.9%; Score 32.6; DB 1; Length 49377; 

Best Local Similarity ' 46.3%; Pred. No. 3.2; 

Matches 107; Conservative 0; Mismatches 124; Indels 0; Gaps 0; 

Qy 101 cactccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaacccagg 160 

I I I I I I I I I I I II I I II Til I I I III I I I I 

Db 12535 CCCTCCAAGCCCTCTTGGACTCCATCCCGAGCGCTCACCCGCTCACCGCCGTCGTCCACG 12594 

Qy 161 cccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgctcgca 220 

II I III I I I I I I I I I I I I I I I I I I 

Db 12595 CCGCGGGCGCCCTCGACGACGGCCTGCTCGGCGCCATGAGCCCCGAGCGCATCGACCGCG 12 654 

Qy 221 at cccat caeca tgacccctcacgcctggcgcgccgccgacctctccaagaaagtcgtga 280 

I I I I I I I I I I I I I I I I I I I I I I I 

Db 12 655 TCTTTGCCCCCAAGCTCGATGCTGCTTGGCACTTGCATGAGCTCACCCAAGACAAGCCCC 12714 

Qy 281 agacaagcactgtcttcttccccttctatgcaggtatccttggatggccag 331 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 12715 TCGCCGCCTTCGTCCTCTTCTCGTCCGCTGCTGGCGTCCTTGGTAGTCCAG 12765 



RESULT 10 
US-08-998-416-882/C 

; Sequence 882, Application US/08998416 
; Patent No. 6239264 
; GENERAL INFORMATION: 

APPLICANT: Philippsen, Peter 



APPLICANT: Pohlmann, Rainer 
APPLICANT: Steiner, Sabine 
APPLICANT: Mohr, Christine 
APPLICANT: Wendland, Jurgen 
APPLICANT: -Knechtle, Philipp 
APPLICANT: Rebischung, Corinne 

TITLE OF INVENTION: GENOMIC DNA SEQUENCES OF ASHBYA GOSSYPII 
TITLE OF INVENTION: AND USES THEREOF 
NUMBER OF SEQUENCES: 1152 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: No. 62392 64artis Corporation 
STREET: 3054 Cornwallis Road 
CITY: Research Triangle Park 
STATE: No. 6239264th Carolina 
COUNTRY: USA 
ZIP : 27709 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS /MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/998,416 
FILING DATE: 24-DEC-1997 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: CH 0016/97 
FILING DATE: 31-DEC-1996 
ATTORNEY/AGENT INFORMATION: 
NAME: Meigs, J. Timothy 
REGISTRATION NUMBER: 38,241 

REFERENCE/DOCKET NUMBER: PF/5-3030 6/A/CGC1 97 6 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 919-541-8587 
TELEFAX: 919-541-8 68 9 
INFORMATION FOR SEQ ID NO: 882: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 490 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : 'single 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic)- 
ORIGINAL SOURCE: 

ORGAN ISM: PAG 1 5 5 2 U P 
US-08-998-416-882 



Query Match 7.6%; Score 31.4; DB 4; Length 490; 

Best Local Similarity 48.5%; Pred. No. 0.77; 

Matches 83; Conservative 1; Mismatches 87; Indels 0; Gaps 0; 

Qy 209 ccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctcca 268 

I I I I II I I I I I I I I I III I I I I I I I I I I I I I I III 
Db 356 CCGTCATCCGCGACGCCGTCACCTACACCGAGCACGCCAAGCGCAAGACCGTCACCTCGC 2 97 



Qy 2 69 agaaagtcgtgaagacaagcactgtcttcttccccttctatgcaggtatccttggatggc 328 
I I I I I I I I I II I III I I I I I I I I I 



Db 



296 TCGACGTCGTGTACGCGCTCAAGCGCCAGGGCCGCACATTGTACGGCTTCGGCGGCTGAG 237 



Qy 329 cagtcgcagccgcctggtggttcaacggaaacatgtgactcttccaaatgg 379 

I I I I I I I I I I hllll III III II I 

Db 236 CCGCGTGCCGCGGCCGGCTATGTATAATATGTAYGTGAATCTGCCATATCG 186 



RESULT 11 
US-08-125-468-1/C 

Sequence 1, Application US/08125468 
Patent No. 5589385 
GENERAL INFORMATION: 

APPLICANT: Ryan, Michael J. 
APPLICANT: Lotvin, Jason A. 
APPLICANT: Strathy, Nancy 
APPLICANT: Fantini, Susan E. 

TITLE OF INVENTION: Cloning of the biosynthetic pathway for 
TITLE OF INVENTION: chlortetracycline and tetracyline Formation and 
cosmids 

TITLE OF INVENTION: useful therein 
NUMBER OF SEQUENCES: 1 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: American Cyanamid Company 
STREET: One Cyanamid Plaza 
CITY: Wayne 
STATE: New Jersey 
COUNTRY: USA 
ZIP: 07470 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/125,4 68 
FILING DATE: 22-SEP-1993 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
NAME: Tsevdos, Estelle J 
REGISTRATION NUMBER: 31,145 
REFERENCE/DOCKET NUMBER: 31,255-02 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (201) 831-3241 
TELEFAX: (201) 831-3305 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 30001 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
US-08-125-468-1 



Query Match 7.6%; Score 31.2; DB 1; Length 30001; 

Best Local Similarity 45.0%; Pred. No. 7.1; 

Matches 117; Conservative 0; Mismatches 143; Indels 0; Gaps 



Qy 78 cgataccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccaccac 137 

II I I I I I I I I I I II II II I I I I I I I I 

Db 8059 CGTGATCGACACCAACCTCACCAGCGTCTTCCGCGTCACCCGCGAGGTCCTCACCACCGG 8000 

Qy 138 aacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggt 197 

III II III II I I I I I I I I I I I I I I I I 

Db 7 999 CGGCATGGAGGCGGCCGGGCACGGGCGGATCATCAGCGTCGCCTCCACCGGCGGCAAGCA 7 94 0 

Qy 198 agagccccaccccttcgctcgcaatcccat caeca tgacccctcacgcctggcgcgccgc 257 

I III I I I I I I I I I I I I I I III III 

Db 7 939 GGGTGTCCCGCTGGGCGCCCCCTACTCGGCCTCCAAGGCCGGCGTCATCGGCTTCACCAA 7880 

Qy 258 cgacctctccaagaaagtcgtgaagacaagca.ctgtcttcttccccttctatgcaggtat 317 

I II I I I I I I I I I I ' I I I I I I I II I I I II I III 

Db 7879 GGCGCTGGCCAAGGAACTCGCCCACACCGGCACCACGGTCAACGCCGTCTGCCCCGGCTA 7820 

Qy 318 ccttggatggccagtcgcag 337 

I I I II! I II I 
Db 7 819 CGTCGAGACGCCGATGGCCG 7 800 



RESULT 12 
US-08-474-933-l/c 

Sequence 1, Application US/08474933 
Patent No. 5866410 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Ryan, Michael J. 
Lotvin, Jason A. 
Strathy, Nancy 



Fantini, Susan E. 
TITLE OF INVENTION: Cloning of the biosynthetic pathway for 
TITLE OF INVENTION: chlortetracycline and tetracyline Formation and 
cosmids 

TITLE OF INVENTION: useful therein 
NUMBER OF SEQUENCES: 1 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: American Cyanamid Company 
STREET: One Cyanamid Plaza 
CITY: Wayne 
STATE: New Jersey 
COUNTRY: USA 
ZIP: 07470 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/47 4,933 
FILING DATE: 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/125,468 
FILING DATE: 22-SEP-1993 
ATTORNEY/AGENT INFORMATION: 
NAME: Tsevdos, Estelle J 



REGISTRATION NUMBER: 31,145 

REFERENCE/DOCKET NUMBER: 31,255-02 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (201) 831-32.41 

TELEFAX: (201)831-3305 
; INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 30001 base pairs 

TYPE: . nucleic acid 

STRANDEDNESS: single 

TOPOLOGY: linear 
MOLECULE TYPE: DNA (genomic) 
US-08-474-933-1 



Query Match 7.6%; Score 31.2; DB 2; Length 30001; 

Best Local Similarity 45.0%; Pred. No. 7.1; 

Matches 117; Conservative 0; Mismatches 143; Indels 0; Gaps 

Qy 78 cgataccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccaccac 137 

I I I I I II I II I I II II II I I I I I I I I 

Db 8059 CGTGATCGACACCAACCTCACCAGCGTCTTCCGCGTCACCCGCGAGGTCCTCACCACCGG 8000 

Qy 138 aacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggt 197 

III II 111 II I I I I I I I I I I I I I I I I 

Db 7 999 CGGCATGGAGGCGGCCGGGCACGGGCGGATCATCAGCGTCGCCTCCACCGGCGGCAAGCA 7 94 0 

Qy 198 agagccccacccctt cgctcgcaatcccat caeca tgacccctcacgcctggcgcgccgc 257 

I II I I II I I I I I I I I I I I III 111 

Db 7939 GGGTGTCCCGCTGGGCGCCCCCTACTCGGCCTCCAAGGCCGGCGTCATCGGCTTCACCAA 7880 

Qy 258 cgacctctccaagaaagtcgtgaagacaagcactgtcttcttccccttctatgcaggtat 317 

I I ; I I I I I I I I I I III I I I I II I I I I I I III 

Db 787 9 GGCGCTGGCCAAGGAACTCGCCCACACCGGCACCACGGTCAACGCCGTCTGCCCCGGCTA 7 820 

Qy 318 ccttggatggccagtcgcag 337 

III II I I I I I 

Db 7819 CGTCGAGACGCCGATGGCCG 7800 



RESULT 13 
US-09-056-105-9 

; Sequence 9,' Application US/09056105 

; Patent No. 6287569 

; GENERAL INFORMATION: 

; APPLICANT: KIPPS, THOMAS J. 

; APPLICANT: WU, YUNQI 

; TITLE OF INVENTION: VACCINES WITH ENHANCED INTRACELLULAR 

; TITLE OF INVENTION: PROCESSING 

; FILE REFERENCE: 233/221 

; CURRENT APPLICATION NUMBER: US/09/056,105 

; CURRENT FILING DATE: 1998-04-06 

; EARLIER APPLICATION NUMBER: 60/043,467 

; EARLIER FILING DATE: 1997-04-10 

; NUMBER OF SEQ ID NOS : 35 

; SOFTWARE: FastSEQ for Windows Version 3.0 

; SEQ ID NO 9 



LENGTH: 11495 
TYPE: DNA 

ORGANISM : Homo sapiens 
US-09-056-105-9 



Query Match 7.5%; Score 30.8; DB 4; Length 11495; 

Best Local Similarity 49.4%; Pred. No. 5.9; 

Matches 80; Conservative 0; Mismatches 82; Indels 0; Gaps 0; 

Qy 72 tcctagcgataccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaac 131 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 604 tcccttccacactcctaacccaatccacaccctcatcccctaccagcaccccatcctccc 663 

Qy 132 caccacaacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcgg 191 

I I I I I I I I I I I I I III III I I I I I I III 

Db 664 caaccccgtgccaccctcatacccccatccccaattcaacccccgcaccctcatccccca 723 

Qy 192 catggtagagccccaccccttcgctcgcaatcccatcaccat 233 

I I I I III I I I I I I I I I I I 

Db 724 ccccacacctgcacccccaccccccaacacccatacccccat 765 



RESULT 14 
US-08-752-760A-1 

; Sequence 1, Application US/08752760A 

; Patent No. 5877011 

; GENERAL INFORMATION: 

; APPLICANT: Armentano, Donna 

APPLICANT: Gregory, Richard J. 

APPLICANT: Smith, Alan E. 

TITLE OF INVENTION: CHIMERIC ADENOVIRAL VECTORS 
NUMBER OF SEQUENCES: 3 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Baker & Botts, L.L.P. 

STREET: 30 Rockefeller Plaza 

CITY: New York 

STATE: NY 

COUNTRY: U.S.A. 

ZIP: 10112 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Diskette 

COMPUTER: IBM Compatible 

OPERATING SYSTEM: DOS 

SOFTWARE: FastSEQ Version 2.0 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/752 , 7 60A 

FILING DATE: 20-NOV-1996 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 

FILING DATE: 
ATTORNEY/AGENT INFORMATION: 

NAME: Seide, Rochelle K 

REGISTRATION NUMBER: 32,300 

REFERENCE /DOCKET NUMBER: A31385 
TELECOMMUNICATION INFORMATION: 



TELEPHONE: 212-705-5000 
TELEFAX: 212-705-5020 
TELEX: 

; INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 35081 base pairs 
TYPE: nucleic acid 
STRANDEDNESS: single 
TOPOLOGY: linear 

US-08-752-760A-1 



Query Match 7.4%; Score 30.6; DB 2; Length 35081; 

Best Local Similarity 48.6%; Pred. No. 12; 

Matches 84; Conservative 0; Mismatches 89; Indels 0; Gaps 0; 

Qy 96 cccaacactccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagtaac 155 

I I I I I I I II. I I III I I I I I I I I III I I I I I I I 

Db 16373 CCGACCCCTGGCTCCCAGCCTCCACCGCTACCCTTCCACTTCTACCGTCGCCACGGTCAC 16432 

Qy 156 ccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttcgc 215 

I II I I I I I I I I I I I I I I I I I I I I I I- I 

Db 164 33 CGAGCCTCCCAGGAGGCGAAGATGGGGCCCCGCCAACCGGCTGATGCCCAACTACGTGTT 164 92 

Qy 216 tcgcaatcccat caeca tgacccctcacgcctggcgcgccgccgacctctcca 2 68 

I I I I I I I I I II I I I II I I I I II I III 

Db 164 93 GCATCCTTCCATTATCCCGACG'CCGGGCTACCGCGGCACCCGGTACTACGCCA 1654 5 



RESULT 15 
US-09-103-840A-2 

; Sequence 2, Application US/09103840A 

; Patent No. 6294328 

; GENERAL INFORMATION: 

; APPLICANT: FLEISCHMAN, Robert D. 

; APPLICANT: WHITE, Owen R. 

; APPLICANT: FRASER, Claire M. 

; APPLICANT: VENTER, John C. 

; TITLE OF INVENTION: DNA SEQUENCES FOR STRAIN ANALYSIS IN MYCOBACTERIUM 

; TITLE OF INVENTION: TUBERCULOSIS 

; FILE REFERENCE: 24 366-20007.00 

; CURRENT APPLICATION NUMBER: US/0 9/103, 840A 

; CURRENT FILING DATE: 1998-06-24 

; NUMBER OF SEQ ID NOS : 2 

; SOFTWARE: Patentln Ver. 2.1 

; SEQ ID NO 2 

LENGTH: 4403765 

TYPE: DNA 

; ORGANISM: Mycobacterium tuberculosis 
FEATURE: 

OTHER INFORMATION: CDC 1551 

OTHER INFORMATION: "n" bases at various positions throughout the sequence 
OTHER INFORMATION: represent a, t, c or g 
US-09-103-840A-2 



Query Match 



7.4%; Score 30.6; DB 4; Length 4403765; 



Best Local Similarity 57.6%; Pred. No. 48; 

Matches 57; Conservative 0; Mismatches 42; Indels 0; Gaps 0; 

Qy 183 cgcccgcggcatggtagagccccaccccttcgctcgcaatcccat caeca tgacccctca 242 

III II' I I I I I I I I I I I III II I I I I III I I I I I 
Db 27 65925 caccaccgtggtgttcgacgcactccccggcgccgacacggtcatcgacatctccgccca 2765984 

Qy 243 cgcctggcgcgccgccgacctctccaagaaagtcgtgaa 281 

III I I I I I I I I I II I II III I II I 
Db 2765985 caccgtgcgccgcgccagcctcaacgaccaagacctgga 2766023 



Search completed: February 7, 2002, 11:22:20 
Job time: 7906 sec 

GenCore version 4.5 
Copyright (c) 19'93 - 2000 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on: February 7, 2002, 08:20:54 ; Search time 4942.22 Seconds 

(without alignments) 
893.630 Million cell updates/sec 

Title: US-09-394-7 4 5-6603 

Perfect score: 411 

Sequence: 1 agcaaaagcatagagatcca aggagaagaggaagggaccg -4 11 

Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



Searched: 11351937 seqs, 5372889281 residues 

Total number of hits satisfying chosen parameters: 22703874 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database : EST:* 

1: em_estfun:* 
2: em_esthum:* 
3: em_estin:* 
4: em_estom:* 
5: em_estpl:* 
6: em_estba:* 
7: em_estro:* 
8: em_estov:* 
9: em_htc:* 
10: gb_estl:* 
11: gb_est2:* 
12: gb_htc:* 
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Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 





No . 


Score 


Match 


Length 


DB 


ID 


Description 
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53.4 


13 


0 
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1 1 


BI1904 40 
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ile09fs.r 
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O O A 
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1 0 


CNSOOdUU 
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Drosophil 
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7 


n "3 0 
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LNbUUCNCj 
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Drosophil 
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5 
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CNS007 1A 


AL066286 


Drosophil 
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4 
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8 
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CNS00LT2 
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4 94 
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CNS016KT 
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Drosophil 
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Drosophil 
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AL106910 


Drosophil 
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41.8 


10 


2 
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13 


CNS011EQ 
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Drosophil 
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41.6 
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1 
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AL141958 


Anopheles 
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18 


40.8 


9 


9 
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9 


9 
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9 


9 
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13 


CNS00LO0 
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Drosophil 


c 
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8 
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13 
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AZ199083 


SP_1038_B 


c 


22 


4 0.4 


9 


8 
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13 


CNS016P9 


AL107031 


Drosophil 
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8 


1101 


13 


CNS00FXE 


AL071370 


Drosophil 
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9 


8 
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13 
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B21282 T20G4-T7.1 
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40.2 


9 


8 
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13 


CNS018FL 


AL109275 


Drosophil 


c 
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8 
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13 


CNS015SQ 
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Drosophil 
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8 
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BG180518 
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7 
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SP_1032_A 
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7 


932 


13 
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AL403502 


T7 end of 


c 
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7 
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11 
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HVSMEbOOO 
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9 


7 
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13 
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Tetraodon 
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9 


7 
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AL536103 
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9 


7 
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13 


CNS00ZZK 


AL098330 


Drosophil 


c 
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9 


6 
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11 


BG786286 


BG786286 


SEAUMC006 




35 


39.6 


9 


6 
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13 


CNS0766T 


AL431019 


T7 end of 




36 


39. 6 


9 


6 


1101 


13 


CNS016XR 


AL107337 


Drosophil 




37 


39.4 


9 


6 


638 


13 


CNS046XY 


AL277279 


Tetraodon 




38 


39.4 


9. 


6 


1046 


13 


CNS07BWV 


AL438437 


T7 end of 



c 


39 


39 . 4 


9 . 


6 


1063 


13 


CNS04 OFA 


AL2 68831 


Tetraodon 


c 


40 


39.4 


9. 


6 


1101 


13 


CNS017ZR 


AL108705 


Drosophil 


c 


41 


39.2 


9. 


5 


363 


10 


AL534163 


AL534163 


AL534163 




42 


39 


9. 


5 


544 


13 


CNS015XA 


AL106024 


Drosophil 


c 


43 


39 


9. 


5 


798 


13 


CNS02PA9 


AL207738 


Tetraodon 


c 


44 


39 


9. 


5 


823 


10 


AL573901 


AL573901 


AL573901 




45 


39 


9. 


5 


880 


11 


BF298281 


BF298281 


016PbB02 



ALIGNMENTS 



RESULT 1 

BI190440 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



BI190440 435 bp mRNA EST 10-JUL-2001 

ile09fs.rl Fusarium sporotrichioides Tri 10 overexpressed cDNA 
library Fusarium sporotrichioides cDNA clone ile09fs 5 ! , mRNA 
sequence . 
BI190440 

BI190440. 1 GI: 14664119 
EST. 

Fusarium sporotrichioides*. 
Fusarium sporotrichioides 

Eukaryota; Fungi; Ascomycota; Pezizomycotina; Sordariomycetes ; 
Hypocreales; mitosporic Hypocreales; Fusarium. 
1 (bases 1 to 435) 

Ren,Q., Tag, A., Peplow,A., Lai,H., Kupfer,C, Peterson, A., Beremand 
, M. and Roe,B. 

Analysis of a Fusarium sporotrichioides EST database 
Unpublished (2001) 

Contact: Bruce A. Roe, University of Oklahoma, broe@ou.edu 
Department of Chemistry and Biochemistry 

Advanced . Center for Genome Technology, University of Oklahoma 
620 Parrington Oval, Norman, OK 73019, USA 
Tel: 405 325 4912 
Fax: 405 325 7762 
Email : broe@ou . edu 

Contact Dr. Marian Beremand regarding clone availability Included 
is the best homolog from a blastx search of Genbank nr 04-09-01 
155 6e-10 gi|12718428|emb|CAC2 (AL513462) putative protein 
[Neurosporacra 
Seq primer: T3 

High quality sequence stop: 395. 
Location/Qualifiers 
1. .435 

/organism=" Fusarium sporotrichioides" 
/strain= M Tri 10" 
/db_xr e f="taxon:5514" 
/clone="ile09fs" 

/clone_lib="Fusarium sporotrichioides Tri 10 overexpressed 
cDNA library" 

/note="Vector : pBlueScript SK-; Site_l : EcoRI; Site_2: 
Xhol; 5 ! end of cDNA cloned into EcoRI site of pBluescript 
; 3' end of cDNA cloned into Xhol site of pBluescript" 
107 a 138 c 76 g 114 t 



Query Match 13.0%; Score 53.4; DB 11; Length 435; 

Best Local Similarity 55.7%; Pred. No. 0.0018; 

Matches 102; Conservative 0; Mismatches 81; Indels 0; Gaps 0 



Qy 184 gcccgcggcatggtagagccccaccccttcgctcgcaatcccatcaccatgacccctcac 243 

Mi ll I I I I I I I I I I I I I I I I I I I I I I I I I I I II III 

Db 18 9 GCCGGCCGAGCTATGGAGTCTCACCCCTTCGAGCGTATTCCCCTCACTCAGAAGCCTGCT 24 8 

Qy 24 4 gcctggcgcgccgccgacctctccaagaaagtcgtgaagacaagcactgtcttcttcccc 303 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 24 9 TCTCCTGATTACGCCAAGATGTTCAAGCGAGTTGGCAGCCAGGCCCTCTTCTTCTTCCCT 308 

Qy 304 ttctatgcaggtatccttggatggccagtcgcagccgcctggtggttcaacggaaacatg 363 

II III I I I I I I I I I I I I I I I I I I I I II I I I I I I II 

Db 309 GGCTTCGCTGTCATCCTTGGCTGGCCTTTGGCTGCCCAGTATGCCTTTGACGGTAAACTG 368 

Qy 364 tga 366 
I I 

Db 369 TAA 371 



RESULT 2 
CNS006U0/C 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



CNS006U0 884 bp DNA GSS 03-JUN-1999 

Drosophila melanogaster genome survey sequence T7 end of BAC # 
BACR14N21 of RPCI-98 library from Drosophila melanogaster (fruit 
fly), genomic survey sequence. 
AL065923 

AL065923.1 GI:4944891 
GSS. 

fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 
Pterygota; Neoptera; Endopterygota ; Diptera; Brachycera; 
Muscomorpha; Ephydroidea; Drosophilidae; Drosophila. 
1 (bases 1 to 884) 
Genoscope . 
Direct Submission 

Submitted (02-JUN-1999) Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of 
collaboration with the Berkeley Drosophila Genome Project (BDGP) . 
The BDGP is constructing a physical map of the Drosophila 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAC library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong's laboratory in the Department of 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY, The library is named RPCI-98 and was constructed by partial 
EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2 ; cn bw sp, the same strain used for the BDGP's 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 
filters for hybridization from the BACPAC Resource Center can be 
found at http://bacpac.med.buffalo.edu/drosophila bac.htm. 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Location/Qualifiers 
1. .884 

/organisni="Drosophila "melanogaster" 
/db_xref="taxon: 7227" 
/clone_lib="RPCI-98" 
/clone="BACR14N21" 
/note="end : T7" 
230 a 62 c 139 g 124 t 329 others 



Query Match 12.8%; Score 52.6; DB 13; Length 884; 

Best Local Similarity 15.6%; Pred. No. 0.0034; 

Matches 40; Conservative 119; Mismatches 97; Indels 0; Gaps 0; 

Qy 18 ccatcttctctgctcaatcaattacacaacaagagcattctagatttgagttcatcctag 77 

: : : | : : : | : : : : : : :::::::::::: : : : : : : : : 

Db 87 6 MMHTKKKKKTTHMMVMNMMMMMMMMMMMMMMMMMMMMMMMGMMMMMMMMMMMAMMMMMMM 817 

Qy 78 cgataccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccaccac 137 

: : : : : : : ::::::: :::::::: : : : : : : : : : | | : : | : : | | : : : : : | : : : 
Db 816 MMMMMMMMMMMMMMMMMMMMMMMMMMMMAMMCMMMCMMMACCMMCMAMMCAMMMMMCMMM 757 

Qy 138 aacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggt 197 

Db 756 MMMMMMCMMMMMCMMCMCCMMMCCMCMCCMMMMACACMACACMCACCMMMMACSMMMAMC 697 

Qy 198 agagccccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgc 257 

: : : I : : : : : : : : I I : : I I I I : : I I ! i I I I : I : I I I I 

Db 696 MCMCMACMMMMMMMMAAMMMCCACHMCCAMACCMMMCCCCCCMCACCMCMMMMCCCCCCC 637 

Qy 258 cgacctctccaagaaa 273 

I : : : I ! I : I I : 
Db 63 6 CMMMMACAACAMCAAM 621 ■ 



RESULT 3 

CNS00CNG 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



CNS00CNG 939 bp DNA GSS 04-JUN-1999 

Drosophila melanogaster genome survey sequence TET3 end of BAC # 
BACR26H16 of RPCI-98 library from Drosophila melanogaster (fruit 
fly), genomic survey sequence. 
AL059400 

AL059400.1 GI:4946964 
GSS. 

fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 

Pterygota; Neop'tera; Endopterygota; Diptera; Brachycera; 

Muscomorpha; Ephydroidea; Drosophilidae; Drosophila. 

1 (bases 1 to 939) 

Genoscope . 

Direct Submission 

Submitted ( 02- JUN-1 999 ) Genoscope - Centre National de Sequencage : 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried. out as part of a 



FEATURES 

source 



BASE COUNT 
ORIGIN 



collaboration with the Berkeley Drosophila Genome Project (BDGP) 
The BDGP is constructing a physical map of the Drosophila 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAG library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong's laboratory in the Department 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY. The library is named RPCI-98 and was constructed by partial 
EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2; cn bw sp, the same strain used for the BDGP 1 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 
filters for hybridization from the BACPAC Resource Center can be 
found at http://bacpac.med.buffalo.edu/drosophila_bac.htm. 

Location/Qualifiers 

1. .939 

/organism=" Drosophila melanogaster" 
/db_xref="taxon:7227" 
/clone_lib="RPCI-98" 
/clone="BACR2 6H16" 
/note="end : TET3" 
71 a 349 c 104 g 180 t 235 others 



Query Match 



12.7%; Score 52; DB 13; Length 939; 



Best Local Similarity 16.0%; Pred. No. 0.0049; 

40; Conservative 119; Mismatches 91; Indels 



Matches 


Qy 


18 


Db 


378 


Qy 


78 


Db 


438 


Qy 


138 


Db 


498 


Qy 


198 


Db 


558 


Qy 


258 


Db 


618 



I I 



1 1 



0; Gaps 



I : I 



III III 



MM I II II I I I 



I I I II I 



RESULT 4 
CNS0071A/C 

LOCUS CNS0071A 895 bp DNA GSS 03-JUN-1999 

DEFINITION Drosophila melanogaster genome survey sequence TET3 end of BAC # 

BACR14B09 of RPCI-98 library from Drosophila melanogaster (fruit 

fly), genomic survey sequence. 
ACCESSION AL06628 6 



VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



AL066286.1 GI:4945153 
GSS. 

fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 

Pterygota; Neoptera; Endopterygota ; Diptera; Brachycera; 

Muscomorpha ; Ephydroidea; Drosophilidae; Drosophila. 

1 (bases 1 to 895) 

Genoscope . 

Direct Submission 

Submitted ( 02-JUN-1999) Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of 
collaboration with the Berkeley Drosophila Genome Project (BDGP) . 
The BDGP is constructing a physical map of the Drosophila . 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAC library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong ! s laboratory in the Department of 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY. The library is named RPCI-98 and was constructed by partial 
EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2; cn bw sp, the same strain used for the BDGP's 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 
filters for hybridization from the BACPAC Resource Center can be 
found at http://bacpac.med.buffalo.edu/drosophila_bac.htm. 

Location/Qualifiers 

1. .895 

/ organ ism=" Drosophila melanogaster" 
/db_xref="taxon:7227" 
/clone_lib-"RPCI-98" 
/clone="BACR14B09" 
/note="end : TET3" 
124 a 80 c 204 g 179 t 308 others 



Query Match 12.5%; Score 51.2; DB 13; Length 895; 

Best Local Similarity 29.6%; Pred. No. 0.0078; 

Matches 80; Conservative 71; Mismatches 119; Indels ■ 0; Gaps 

Qy 4 aaaagcatagagatccatcttctctgctcaatcaattacacaacaagagcattctagatt 63 

: : I : I I : I I : : : II : : I : I : II : : : : I : I 

Db 650 MMAMCCAMMCACAMMAMCMMCCMCCCACMMCACCCMMMCMCAMAMMAMCACMMCAMCACM 591 

Qy 64 tgagttcatcctagcgataccaatacacccatcccaacactccaaaccaaccaacacttc 123 

: : : : I I I : : : I I I I I I : :-: I I : I : : I : : : : I I : I I : : I I : : 
Db 590 MMMAMCMCAMMMAMCMAMMMMAAMACACMMMMCCAMAMMCMMMMMACMCACMMCCAMMAM 531 

Qy 124 aaccaaaccaccacaacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtc 183 

: : : : I : : : I : : : I : : | : | | | : : : : : : | | : : : . : | | 

Db 530 MMMMAMMMCCMMMCMMCMCMCCCCCCMMMMMMMMAMCCAMMAAAMMCMCMCMCCCMCMAC 471 

Qy 184 gcccgcggcatggtagagccccaccccttcgctcgcaatcccatcaccatgacccctcac 243 
II: : : I I I I I I I I : : I : III II I I I I I : I 



Db 



470 CACCMMCAMMMAACCACCCCCCCCCCCCCMYMCCCMCCACCCCACCCACCCCCCCCMCMC 411 



Qy 244 gcctggcgcgccgccgacctctccaagaaa 273 

II I I I I I I : I : I I I I I : I 
Db 410 CCCCCCCMCACCCCCCMCMACCCCAACMCA 381 



RESULT 5 
BG333443 

LOCUS BG333443 996 bp mRNA EST 27-FEB-2001 

DEFINITION 602430365F1 NIH_MGC_18 Homo sapiens cDNA clone IMAGE : 4 547 993 5', 

mRNA sequence. 
ACCESSION BG333443 
VERSION BG333443.1 GI:13139881 

KEYWORDS EST. 
SOURCE human. 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 1 to 996) 

AUTHORS NIH-MGC http://mgc.nci.nih.gov/. 

TITLE National Institutes of Health, Mammalian Gene Collection (MGC) 

JOURNAL Unpublished (1999) 
COMMENT Contact: Robert Strausberg, * Ph . D . 

Email: cgapbs-r@mail.nih.gov 
Tissue Procurement: DCTD/DTP/Gazdar 
cDNA Library Preparation: Ling Hong/Rubin Laboratory 
cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
DNA Sequencing by: Incyte Genomics, Inc. 

Clone distribution: MGC clone distribution information can be 
found through the I.M.A.G.E. Consortium/LLNL at: 
http://image.llnl.gov 
Plate: LLCM123*7 row: a column: 18 
High quality sequence start: 23 
High quality sequence stop: 483. 
FEATURES Location/Qualifiers 
source 1. .996 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/clone=" IMAGE: 4547993" 
/ c lone_l ib= " N I H_MGC_1 8 " 
/tissue_type="large cell carcinoma" 
/lab_host="DH10B (phage-resistant ) " 

/note="0rgan: lung; Vector: pOTB7; Site_l: Xhol; Site_2: 
EcoRI; cDNA made by oligo-dT priming. Directionally cloned 
into EcoRI/XhoI sites using the following 5 1 adaptor: 
GGCACGAG ( G ) . Library constructed by Ling Hong in the 
laboratory of Gerald M. Rubin (University of California/ 
Berkeley) using ZAP-cDNA synthesis kit (Stratagene) and 
Superscript II RT (Life Technologies). Note: this is a 
NIH_MGC Library. " 

BASE COUNT 171 a 465 c 173 g 187 t 

ORIGIN 



Query Match 

Best Local Similarity 



11.4%; 
52.0%; 



Score 46.8; DB 11; Length 996; 
Pred. No. 0.1; 



Matches 105; Conservative 0; Mismatches 97; Indels 0; Gaps 0 



Qy 66 agttcatcctagcgataccaatacacccatcccaacactccaaaccaaccaacacttcaa 125 

III III I I I I I I I I I II II I I I I I I I I I I I 

Db 7 83 ATTGCCCCCTCCCCCTCCCACCACCCACCCTCCCCACATCTGTCCCACCCCCCACCTCAC 84 2 

Qy 126 ccaaaccaccacaacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgc 185 

II I I I I I I I I I I I I I I I I I I III I I I I II I II 

Db 84 3 CCCCACTACACCACCTAACCCCTCATCACCCCACCCCCCCCCCACGCCCCCTCCCCCCAC 902 

Qy 186 ccgcggcatggtagagccccaccccttcgctcgcaatcccatcaccatgacccctcacgc 245 

III II I I I I I I I I I I I I I I III I I I I I I I I I I 

Db 903 CCCCCACCTCCCCCACACCCACCCCCTCCCCCACCCCCCCCCCCCCCTTCCTCCCCCCCC 962 

Qy 246 ctggcgcgccgccgacctctcc 267 

I I I I I I II I I I 

Db 963 CCCCTCCCCCCCCCCCCCCCCC 984 



RESULT 6 
CNS00LT2/C 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



CNS00LT2 1101 bp DNA GSS 14-JUN-1999 

Drosophila melanogaster genome survey sequence TET3 end of BAC : 
BACR48P19 of RPCI-98 library from Drosophila melanogaster (fruit 
fly), genomic survey sequence. 
AL078714 

AL078714.1 GI:5102004 
GSS. 

fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 

Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; 

Muscomorpha; Ephydroidea; Drosophilidae; Drosophila. 

1 (bases 1 to 1101) 

Genoscope . 

Direct Submission 

Submitted ( 11- JUN-1999) Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of 
collaboration with the Berkeley Drosophila Genome Project (BDGP) . 
The BDGP is constructing a physical map of the Drosophila 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAC library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong's laboratory in the Department of 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY. The library is named RPCI-98 and was constructed by partial 
EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2; cn bw sp, the same strain used for the BDGP ! s 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 
filters for hybridization from the BACPAC Resource Center can be 
found at http://bacpac.med.buffalo.edu/drosophila_bac.htm. 

Location /Qualifiers 

1. .1101 

/organ ism=" Drosophila melanogaster" 



/db_xref="taxon:7227" 

/ c 1 one_l ib= " RPC 1-98" 

/clone="BACR48P19" 

/note="encl : TET3" 
BASE COUNT 469 a 6c 69 g 151 t 406 others 

ORIGIN 



Query Match 10.8%; Score 44.2; DB 13; Length 1101; 

Best Local Similarity 13.0%; Pred. No. 0.48; 

Matches 39; Conservative 141; Mismatches 121; Indels 0; Gaps 0; 

Qy 3 caaaagcatagagatccatcttctctgctcaatcaattacacaacaagagcattctagat' 62 

:::::: I :::::: : | | : : : : : | : ::::::::: : I : : i : I 
Db 1032 MMMMMMMAMMMMAMMMMMMMMMATTTTTHMMMAMAMMMHMMMMMMMMATTHAWHTTTTHT 973 



Qy 63 ttgagttcatcctagcgataccaatacacccatcccaacactccaaaccaaccaacactt 122 

II : |:| : | : : : : : : | : I : : : M : : I : : I : : : I : I : : : : : : : : I I 
Db 972 TTYMMAMCMTTHTMMMMMMMMMAMMAMMMCCMCMCMMCCMMMMCCCCMCMMMMMCMMMTT 913 



Qy 123 caaccaaaccaccacaacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgt 182 
Db 912 TTHHHMMCCMCMCMCCCMCMCMMCMMMMACMMMMMTTTHMMCCCMMMMMMMMMMAAMMMA 853 



Qy 183 cgcccgcggcatggtagagccccaccccttcgctcgcaatcccatcaccatgacccctca 242 

Db 852 MMMMMCMHMAMMTTTTTTTTHWMAMMAYHTTMMMMTTMMMMMMCMMMMAAAATTMMMMMM 7 93 

Qy 243 cgcctgg.cgcgccgccgacctctccaagaaagtcgtgaagacaagcactgtcttcttccc 302 

Db 7 92 AMHMTHHHCTMMMCMMCWCCMCMMMMMMMMCCMMCMMCMCMMMCCMMYHMMHTTTWTTMM 7 33 

Qy 303 c 303 

Db 732 H 732 



RESULT 7 

AA415063 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 
. ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
COMMENT 



AA415063 494 bp mRNA EST 09-DEC-1999 

Mg0008 RCW Lambda Zap Express Library Magnaporthe grisea cDNA clone 
RCW8 3', mRNA sequence. 
AA415063 

AA415063.1 GI:2537242 
EST. 

Magnaporthe grisea. 
Magnaporthe grisea 

Eukaryota; Fungi; Ascomycota; Pezizomycotina ; Sordariomycetes; 
Sordariomycetes incertae sedis; Magnaporthaceae ; Magnaporthe. 
1 (bases 1 to 494) 

Wu,S.-C, Bernstein, B. D. , Darvill,A.G. and Albersheim, P . 

Expressed sequence tags of the rice blast fungus grown on rice cell 

walls 

Unpublished (1997) 
Contact: Sheng-Cheng Wu 
CCRC 

University of Georgia 



FEATURES 

source 



BASE COUNT 
ORIGIN 



220 Riverbend Road, Athens, GA 30602-4712, USA 

Tel: 706 542 4446 

Fax: 706 542 4412 

Email: wusc@bscr.uga.edu 

Identical to Mg0040, Mg0046 

Seq primer : T7 . 

Location/Qualifiers 

1. .494 

/organism="Magnaporthe grisea" 
/strain="CP987" 
/db_xref="taxon: 148305" 
/clone="RCW8" 

/clone_lib="RCW Lambda Zap Express Library" 

/tissue_type="Mycelium" 

/dev_stage="Day 5 post-inoculation" 

/note="Vector : Lambda Zap; Messenger RNAs prepared from 
Magnaporthe grisea grown at 23C in the dark with constant 
gyratory shaking (100 rpm) in Vogel's medium containing 
0.5% isolated rice cell walls as the sole carbon source" 
136 a 124 c 115 g 119 t 



Query Match 10.6%; Score 43.6; DB 10; Length 494; 

Best Local Similarity 52.8%; Pred. No. 0.56; 

Matches 94; Conservative 0; Mismatches 84; Indels 0; Gaps 0; 

Qy 197 tagagccccaccccttcgctcgcaatcccat caeca tgacccctcacgcctggcgcgccg 256 

I ! I I I 1 I I I I I I I I I II I I I I II I II I I I I I I I I 

Db 35 TTGAGCCCCATCCCTTCCAGCGTCTGCCCAACACGCAGAAGCCGCAAGCTGCCGACTATG 94 

Qy 257 ccgacctctccaagaaagtcgtgaagacaagcactgtcttcttccccttctatgcaggta 316 

III I I I I I I II I I I M II II I I I I I I I I II I 

Db 95 CCAAGATCTTCAGGCGTTCCGGCAAGACTGTCATGATCTACTTCCCTGGCATGGCCTTGA 154 

Qy 317 tccttggatggccagtcgcagccgcctggtggttcaacggaaacatgtgactcttcca 374 

I I II I I I I I I I II II I I . I III I I I I I III 

Db 155 TCCTAGGTTGGCCTGTGATCGCTCAGAAGATGGTTGATGGCCACGTCTAAGGTCGCCA 212 



RESULT 8 

BF207170 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



BF207170 830 bp mRNA EST 06-NOV-2000 

601870887F1 NIH_MGC_19 Homo sapiens cDNA clone IMAGE : 4 101 692 5', 
mRNA sequence. 
BF207170 

BF207170.1 GI:11100756 

EST. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 830) 

NIH-MGC h'ttp: //mgc . nci . nih . gov/ . 

National Institutes of Health, Mammalian Gene Collection (MGC) 

Unpublished (1999) 

Contact: Robert Strausberg, Ph.D. 



Email : cgapbs-r@mail . nih . gov 

Tissue Procurement: ATCC 
cDNA Library Preparation: Ling Hong/Rubin Laboratory 
cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
DNA Sequencing by: Incyte Genomics, Inc. 

Clone distribution: MGC clone distribution information can be 
found through the I.M.A.G.E. Consortium/LLNL at: image.llnl.gov 
Plate: LLCM973 row: m column: 21 . 
High quality sequence stop: 343. 
FEATURES Location/Qualifiers 
source 1. .830 

/organism="Homo sapiens" 
/db_xref= r, taxon: 9606" 
/clone= n IMAGE : 4101692" 
/clone_lib="NIH_MGC_19" 
/tissue_type= M neuroblastoma" 
/lab_host="DH10B (phage-resistant ) " 

/note="0rgan: brain; Vector: pOTB7; Site_l: Xhol; Site_2: 
EcoRI; cDNA made by oligo-dT priming. Directionally 
cloned into EcoRI/XhoI sites using the following 5 ! 
adaptor: GGCACGAG (G) . Library constructed by Ling Hong 
in the laboratory of Gerald M. Rubin (University of 
California, Berkeley) using ZAP-cDNA synthesis kit 
(Stratagene) and Superscript II RT (Life Technologies) . 
Note: this is a NIH_MGC Library." 

BASE COUNT 145 a 460 c 152 g 73 t 

ORIGIN 



Query Match • 10.6%; Score 43.6; DB 11; Length 830; 

Best Local Similarity 52.2%; Pred. No. 0.64; 

Matches 97; Conservative 0; Mismatches 89; Indels 0; Gaps 0; 

Qy 82 accaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccaccacaaca 141 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II 
Db 590 ACCACCCCCCCCCCCCCCACCCCACACACCCACCACCCCAACCCCCGCCCCATCACACCA 64 9 

Qy 142 atgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagag 201 

III I I I I I I I I I I II I II I I 

Db 650 TCACCTCACGCCACCCACCACCAGCCCGCCCCCCGCCCCCCGCCACCACCCCCCCCACCA 709 

Qy 202 ccccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgac 261 

I I II I I I I I I I I I III III I I I I I I I I I I I I I I I I I 
Db 710 CCCCTCCCCCACCCCCGCCCACCCCCCCCCCCCACCCCCCCCCCCACACCCCCCCCCCAC 769 



Qy 262 ctctcc 267 

I I I I 
Db 770 CCCCCC 775 



RESULT 9 
CNS016KT 

LOCUS CNS016KT 1013 bp DNA GSS 26-JUL-1999 

DEFINITION Drosophila melanogaster genome survey sequence SP6 end of BAC 

BACN16J16 of DrosBAC library from Drosophila melanogaster (fruit 

fly), genomic survey sequence. 
ACCESSION AL106871 



VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



AL106871.1 GI:5624218 
GSS. 

fruit fly. 

Plasmid Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 

Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; 

Muscomorpha; Ephydroidea; Drosophilidae; Drosophila. 

1 (bases 1 to 1013) 

Genoscope. 

Direct Submission 

Submitted (23-JUL-1999) Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of 
collaboration with the European Drosophila Genome Project (EDGP) - 
http://www.edgp.ebi.ac.uk -. This Drosophila melanogaster BAC 
library (Dros BAC) was made by Alain Billaud at CEPH (Centre 
d 1 Etude du Polymorphisme Humain) with funding provided by a MRC 
project grant. The DNA was prepared from embryos by Alain Bucheton 
and Genevieve Payan. It has been constructed in the vector 
pBeloBACll. 

Location/Qualifiers 

1. .1013 

/organism="Drosophila melanogaster" 
/plasmid="pBeloBACll" 
/db_xref="taxon:7227" 
/clone_lib="DrosBAC" 
/clone="BACN16J16" 
/note="end : SPG" 
132 a 191 c 148 g 131 t 411 others 



Query Match 10.6%; Score 43.6; DB 13; Length 1013; 

Best Local Similarity 16.3%; Pred. No. 0.67; 

53; Conservative 111; Mismatches 162; Indels 0; Gaps 



Matches 


Qy 


22 


Db 


221 


Qy 


82 


Db 


281 


Qy 


142 


Db 


341 


Qy 


202 


Db 


401 


Qy 


262 


Db 


461 



I III 



: I 



I : : : I : : : I 



I : 



: I : : I 



I I : I : I : II 



: I : : 



I I 



I : 



I I I : I : I I.I I 



I I I 



Qy 322 ggatggccagtcgcagccgcctggtg 347 

: : I I I I : : I : : : : : I I 
Db 521 KKCTGTCTGGKSGSCCBGBSGCSTTG 54 6 



RESULT 10 
CNS0073W/C 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



CNS0073W 922 bp DNA GSS 03-JUN-1999 • 

Drosophila melanogaster genome survey sequence TET3 end of BAC # 
BACR14D09 of RPCI-98 library from Drosophila melanogaster (fruit 
fly), genomic survey sequence. 
AL066784 

AL066784 .1 GI:4945247 
GSS. 

fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; ' 

Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; 

Muscomorpha; Ephydroidea; Drosophilidae ; Drosophila . 

1 (bases 1 to 922) 

Genoscope. 

Direct Submission 

Submitted ( 02- JUN-1 999 ) Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref 0genoscope . ens . f 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of 
collaboration with the Berkeley Drosophila Genome Project (BDGP) . 
The BDGP is constructing a physical map of the Drosophila 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAC library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong f s laboratory in the Department o 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY. The library is named RPCI-98 and was constructed by partial 
EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2 ; cn bw sp, the same strain used for the BDGP's 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 
filters for hybridization from the BACPAC Resource Center can be 
found at http://bacpac.med.buffalo.edu/drosophila_bac.htm. 

Location/Qualifiers 

1. .922 

/organism^"Drosophila melanogaster" 
/db_xref="taxon:7227" 
/clone_lib="RPCI-98" 
/clone="BACR14D09" 
/note="end : TET3" 
223 a 95 c 109 g 221 t 274 others 



Query Match 10.4%; Score 42.8; DB 13; Length 922; 

Best Local Similarity 22.2%; Pred. No. 1; 

Matches 65;- Conservative 105; Mismatches 119; Indels 4; Gaps 



Qy 1 agcaaaagcatagagatccatcttctctgctcaatcaattacacaacaagagcattctag 60 

I : : : I : : : : | : : | | : | : : | : : : : : : : I : H | : : : 



Db 855 AMNMMMACMMMMCMMACMMAMCCMMACMMMAMAMMMMMMMMAMMAMCACMAMMMACACMC 7 96 



Qy 61 atttgagttcatcctagcgataccaatacacccatcccaacactccaaaccaaccaacac 120 

I : : : : : I : : I I I : : I I I : : I I : : I : : : : I : I : : I : I I : : : 

Db 7 95 AMMMMCMMMMMMMMCMMCMMCMCCACMMMACACMAM 736 

Qy 121 ttcaaccaaaccaccacaacaatgccttcagtaaccc aggcccgtctcatgtggcg 17 6 

I I : : : I I : : : : : I : I I : : I : : : : : | | : | : : | : : 

Db 735 MACAMMAMAAMMMMMAMAAMMAAMMAAMMMMAMMCCMCCMAAMAMAMMACMMCMMCAMMM 67 6 

Qy 177 tagcgtcgcccgcggcatggtagagccccaccccttcgctcgcaatcccatcaccatgac 236 

: I ■ : : : : : : I I : I : I I I : : : : : : : I : : : I : I : : : I 

Db 675 MMMCAMMMMMAMMAMMMCCCAACAMMCMCACMMMCMAAMAMMMMACMMMAMMAAMMACMC 616 

Qy 237 ccctcacgcctggcgcgccgccgacctctccaagaaagtcgtgaagacaagca 289 

: I I I I I : I I : I I : : : I : : : I : I : I : : I : I 

Db 615 AAMMCACAAAMMACMCMMCMCASACACAMVMRAAMMMMCCCASAMAAMMAMMA 563 



RESULT 11 

CNS016LW 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



CNS016LW 1101 bp DNA GSS 26-JUL-1999 

Drosophila melanogaster genome survey sequence T7 end of BAC 
BACN16J16 of DrosBAC library from Drosophila melanogaster '(fruit 
fly), genomic survey sequence. 
AL106910 

AL106910.1 GI:5624430 
GSS. 

fruit fly. 

Plasmid Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 

Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; 

Muscomorpha; Ephydroidea; Drosophilidae ; Drosophila . 

1 (bases 1 to 1101) 

Genoscope. 

Direct Submission 

Submitted ( 23- JUL-1999) Genoscope - Centre National de Sequencage : 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of a 
collaboration with the European Drosophila Genome Project (EDGP) - 
http://www.edgp.ebi.ac.uk -. This Drosophila melanogaster BAC 
library (Dros BAC) was made by Alain Billaud at CEPH (Centre 
d 1 Etude du Polymorphisme Humain) with funding provided by a MRC 
project grant. The DNA was prepared from embryos by Alain Bucheton 
and Genevieve Payan. It has been constructed in the vector 
pBeloBACll. 

Location /Qualifiers 

1. .1101 ■ 

/ organ ism=" Drosophila melanogaster" 
/plasmid="pBeloBACll" 
/db_xref="taxon:7227" 
/clone_lib=" DrosBAC" 
/clone="BACN16J16" 
/note="end : T7" 
222 a 80 c 146 g 113 t 540 others 



Query Match 10.2%; Score 42; DB 13; Length 1101; 

Best Local Similarity 11.8%; Pred. No. 1.7; 

Matches 41; Conservative 128; Mismatches 173; Indels 4; Gaps 1 

Qy 34 atcaattacacaacaagagcattctagatttgagttcatcctagcgataccaatacaccc 93 

Db 517 MNNNNNNNMNNMNMNMMMGGMGNNNNNNNMCNMMTNNNNMMMNNMMNMMNTNNNTNMNTT 57 6 

Qy 94 atcccaacactccaaaccaaccaacacttcaaccaaaccaccacaacaatgccttcagta 153 

Db 577 NNMMMMMNMCMMMMMMMMMCMMMMMMMMMCMCNMMMMCCMMMNTNTCNTNNMNNNNNMNN 636 

Qy 154 acccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagccccaccccttc 213 

Db 637 MNMMMMMMMMMCMNMGNCNNNMNNNCNCNNMMNCNCNNMMCMMCMNMCMMMNMMNNNMCN 696 

Qy .214 gctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacctctccaagaaa 273 

: : : : : I i : : : I : : I : : : | : | I : : I : : : : I : : : I : : 

Db 697 NNNMNMMMNMCCMNMMCMMCGMHGKMTGTMTKTTTMWTT-GHMGMHHGMMMNTVMMGGWMT 7 56 

Qy 274 gtcgtgaagacaagcactgtcttcttccccttctatgcaggtatccttggatggccagtc 333 

II:: : I : : : I I : I : : : : : : I : I : : : : I I : : : : I : I : 

Db 7 57 GTMMKMKMGHMTG WGTGKTMGHMMMMNMDTMTKMVMKTTTMMAGMMRNVGMGAKGT 812 

Qy 334 gcagccgcctggtggttcaacggaaacatgtgactcttccaaatgg 379 

: I : : : : : I I : I : : : I : I I : I : : : I I : 

Db 813 MTWGVMVRKMMVDMGTRGTGHGHGMMKWTVTGMGTAKKGTNGMTGD 858 



RESULT 12 

CNS011EQ 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



CNS011EQ 748 bp DNA GSS 26-JUL-1999 

Drosophila melanogaster genome survey sequence SP6 end of BAC 
BACN06J14 of DrosBAC library from Drosophila melanogaster (fruit 
fly), genomic survey sequence. 
AL100172 

AL100172.1 GI:5611783 
GSS. 

fruit fly. 

Plasmid Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 

Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; 

Muscomorpha; Ephydroidea; Drosophilidae; Drosophila . 

1 (bases 1 to 748) 

Genoscope . 

Direct Submission 

Submitted ( 23- JUL-1 999) Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of 
collaboration with the European Drosophila Genome Project (EDGP) - 
http://www.edgp.ebi.ac.uk -. This Drosophila melanogaster BAC 
library (Dros BAC) was made by Alain Billaud at CEPH (Centre 
d' Etude du Polymorphisme Humain) with funding provided by a MRC 
project grant. The DNA was prepared from embryos by Alain Bucheton 



FEATURES 

source 



BASE COUNT 
ORIGIN 



and Genevieve Payan. It has been constructed in the vector 
pBeloBACll. 

Location/Qualifiers 

1. .748 

/organism="Drosophila melanogaster" 
/plasmid="pBeloBACll" 
/db_xref ="taxon : 7227 " 
/ c 1 one_l i b= " Dr o s BAC " 
/clone="BACN06J14" 
/note="end : SP6" 
174 a 349 c 35 g 62 t 128 others 



Query Match 10.2%; Score 41.8; DB 13; Length 748; 

Best Local Similarity 44.1%; Pred. No. 1.8; 

Matches 100; Conservative 15; Mismatches 112; Indels 0; Gaps 0; 

Qy 82 accaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccaccacaaca 141 

11:1 : I I I I I I I I I I I III I I I I II I I I I I I 1 

Db 134 ACSACCCMSCCCACACCACCSCCCCSCACCCCCACSACCCC7VAACCCCCCCAACACCCCC 193 

Qy 142 atgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagag 201 

II II I : : I I I : : : II I I I I I I I El I 

Db 194 CACCCCSCAASCASSCAACCSSSCACCAACCCCCCCCCCAACCCCCCCCCCACCCCCCAA 253 

Qy 202 ccccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgac 261 

I I I I I I I I I : I II I I I I II I : II I 1:11 : I : I : I I I 
Db 254 CCCCACCCCAAMCCCACCACCCCCACCACSCCCACAACCASCSCCACSAACCSCSCCCCC 313 

Qy 262 ctctccaagaaagtcgtgaagacaagcactgtcttcttccccttcta 308 

I I I I 1:1 I I I I I I I llhllll 
Db 314. ACCSCCCACSCASCCASCACCCCAACCACCCACCCCCCCSCCCCCAA 360 



RESULT 13 

CNS01FKL 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
• SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 



CNS01FKL 1028 bp DNA GSS 01-JUN-2001 

Anopheles gambiae GSS T7 end of clone 04L01 of NotreDamel library 
from strain PEST of Anopheles gambiae (African malaria mosquito) , 
genomic survey sequence. 
AL141958 

AL141958.1 GI:7000076 
GSS. 

African malaria mosquito. 
Anopheles gambiae 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 
Pterygota; Neoptera; Endopterygota; Diptera; Nematocera; 
Culicoidea; Anopheles. 

1 (bases 1 to 1028) 
Genoscope . 

Direct Submission 

Submitted ( 16-FEB-2000) Genoscope - Centre National de Sequencage : 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

2 (bases 1 to 1028) 

Roth,C.W., Brey,P.T., Ke,Z., Collins, F.H. and Weissenbach, J . 



TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Direct Submission 

Submitted ( 1 6-FEB-2000 ) BBMI, Institut Pasteur, 25, rue du Dr. 
Roux, Paris 75015, France 

This clone is from an A. gambiae BAC library provided by F.H. 
Collins and sequenced by Genoscope in collaboration with the 
Laboratory of Biochem. and Biol. Molec. of Insects, Institut 
Pasteur . 

Location/Qualifiers 
1. .1028 

/organism= "Anopheles gambiae" 
/strain="PEST" 
/db_xref="taxon:7165" 
/clone="04L01" 
/clone_lib="NotreDamel" 
/note="end : T7" 
213 a 339 c 202 g 187 t 87 others 



Query Match 10.1%; Score 41.6; DB 13; Length 1028; 

Best Local Similarity 43.5%; Pred. No. 2.2; 

Matches 74; Conservative 21; Mismatches 75; Indels 0; Gaps 

Qy 77 gcgataccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccacca 136 

II I I I I I I I I .1 I I I : I : I I I I I I I I I I M 1:111 II: M 
Db 120 GGGGGACCAATAAAAACCCCCAAMCCMCCTAAACCAACCTACCAMCCMCCCATCCCMACA 17 9 

Qy 137 caacaatgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatgg 196 

I I : I I : I I I I : I I : II: I : I I : I I I I I : 

Db 180 CCCCCCCAMCCCCMCCYCCCCACMYCCCMYCCAYCTCCMMTAAMMCCYCCCCCCCTYYCC 239 

Qy 197 tagagccccaccccttcgctcgcaatcccatcaccatgacccctcacgcc 246 

: : : I I : I I I I 1 I II: I : II : I : I I : I II 
Db 24 0 CCCCCYYYCCCYCCCCCCCCCCCCCCCCYTTYCCCCCCYYCYCCCMCCCC 28 9 



RESULT 14 

BF973142 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



•BF973142 1344 bp mRNA EST 22-JAN-2001 

602242133F1 NIH_MGC_46 Homo sapiens cDNA clone IMAGE : 4 33064 0 5 f , 
mRNA sequence. 
BF973142 

BF973142.1 GI:12340459 

EST. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 1344) 

NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian' Gene Collection (MGC) 

Unpublished (1999) 

Contact: Robert Strausberg, Ph.D. 

Email : cgapbs-r@mail . nih . gov 

Tissue Procurement: ATCC 
cDNA Library Preparation: Ling Hong/Rubin Laboratory 
cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 



FEATURES 

source 



BASE COUNT 
ORIGIN 



DNA Sequencing by: Incyte Genomics, Inc. 

Clone distribution: MGC clone distribution information can be 
found through the I.M.A.G.E. Consortium/LLNL at: 
http : //image . llnl . gov 
Plate: LLCM1194 row: a column: 09 
High quality sequence stop: 168. 

Location/Qualifiers 

1. .1344 

/organism-"Homo sapiens" 
/db_xref="taxon: 9606" 
/clone="IMAGE: 4330640" 
/clone_lib="NIH_MGC_4 6" 

/t is sue_type=" leiomyosarcoma cell line" 
/lab_host-"DH10B (phage-resistant ) " 

/note="Organ: uterus; Vector: pOTB7; Site_l: Xhol; Site_2: 
EcoRI; cDNA made by oligo-dT priming. Directionally cloned 
into EcoRI/XhoI sites using the following 5' adaptor: 
GGCACGAG(G). Size-selected >500bp for average insert size 
1.8kb. Library constructed by Ling Hong in the laboratory 
of Gerald M. Rubin (University of California, Berkeley) 
using ZAP-cDNA synthesis kit (Stratagene) and Superscript 
II RT (Life Technologies). Note: this is a NIH_MGC 
Library . " 

268 a 768 c 151 g 154 t 3 others 



Query Match 10.1%; Score 41.6; DB 11; Length 1344; 

Best Local Similarity 51.6%; Pred. No. 2.3; 

Matches 95; Conservative 0; Mismatches 89; Indels 0; Gaps 0; 

Qy 83 ccaatacacccatcccaacactccaaaccaaccaacacttcaaccaaaccaccacaacaa 142 

III II I I I I I I I I I II II II I I I I I I I I I I 
Db 410 CCCACCCACTCCACACCACACCCTCTAGCCTCCGCCCCCACCACCCTCCACCCGCCACCC 4 69 

Qy 143 tgccttcagtaacccaggcccgtctcatgtggcgtagcgtcgcccgcggcatggtagagc 202 

III I III I M I I I III I I I I I I I I I II III 

Db 4 70 CCCCTGCCCCCCCCCCCGCCTGCCCCCCCTGCCACACTATCGCCCGCCCCACCAGCGACC 52 9 

Qy 203 cccaccccttcgctcgcaatcccatcaccatgacccctcacgcctggcgcgccgccgacc 262 

I I I I I I I I I I I I I I I I J II I I I I I I I I I I I I I 
Db 530 ACTCCTCCTGCCCTCCCACTCCCTCACCACTCCCTCTCCACTCCTAGCCTGACCCCACAC 58 9 

Qy 263 tctc 266 
I I I 

Db 590 GCTC 593 



RESULT 15 

BF868167 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 



BF868167 1737 bp mRNA EST 19-JAN-2001 

963101A02.yl C. reinhardtii CC-1690, Stress condition I, normalized 
, Lambda Zap II Chlamydomonas reinhardtii cDNA, mRNA sequence. 
BF868167 

BF868167.1 GI:12258311 
EST. 

Chlamydomonas reinhardtii. 



w 



ORGANISM 



REFERENCE 
AUTHORS 

TITLE 



JOURNAL 
COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Chi amydomona s reinhardtii 

Eukaryota; Viridiplantae ; Chlorophyta; Chlorophyceae ; Volvocales; 
Chlamydomonadaceae; Chlamydomonas . 
1 (bases 1 to 1737) 

Grossman, A. , Davies,J., Federspiel, N . , Harris, E., Hauser,C, 

Lefebvre,P., McDermott , J . P . , Shrager,J., Silflow,C. and Stern, D. 

Analyses of the Chlamydomonas reinhardtii Genome: A Model, 

Unicellular System for Analyzing Gene Function and Regulation in 

Vascular Plants; project phase 3 

Unpublished (2000) 

Contact: Charles Hauser 

DCMB Box 91000 

Duke University 

Durham, NC 27708-1000 

Tel: 919 613 8159 

Fax: 919 613 8177 

Email : chauser@duke . edu . 

Location/Qualifiers 

1. .1737 

/organis.m="Chlamydomonas reinhardtii" 
/strain="CC-1690 wild type mt+ 21gr" 
/db_xref="taxon: 3055" 

/clone__lib="C. reinhardtii CC-1690, Stress condition I, 
normalized, Lambda Zap II" 

/note="Vector : pBluescript II SK-; Site_l : EcoRI; Site_2 : 
Xhol; This library, constructed by John Davies and Jeffrey 
McDermott, combines cDNAs from CC-1690 cells grown to 
mid-log phase in TAP-N (30 min, lhr, 4hr) , TAP-S (30 min, 
lhr, 4hr), TAP-P {4hr, 12hr, 24hr) , N03 to NH4 (30min, lhr 
, 4hr) and NH4 to N03 (30min, lhr, 4hr) . PolyA mRNA was 
purified from each sample, pooled and cDNA synthesized. 
The cDNA was directionally cloned into lambda Zap II 
(Stratagene) in the EcoRI (5 f ) and XhoRI (3,') sites. 
pBluescript II SK- plasmids were excised from the lambda 
ZAP clones by superinfection with ExAssist (Stratagene) 
phage. The library was normalized using method 4 described 
in Bonaldo et al (1996) Genome Research 6: 791-806." 
506 a 968 c 129 g 117 t 17 others 



Query Match 
Best Local Similarity 

84; Conservative 



Matches 


Qy 


88 


Db 


609 


Qy 


148 


Db 


669 


Qy 


208 


Db 


729 



10.1%; Score 41.4; DB 11; Length 1737; 
53.8%; Pred. No. 2.8; 

Mismatches 72; Indels 



I I I I I I I I I I I I I I 



0; 



I I I I I II I I 



0; Gaps 



I I 



I I I I 



I I I I I 



III I II 



I I I I I I I II 



I I 



I I 



I I I 



I I I 



I III 



0; 



Search completed: February 7, 2002, 08:20:58 
Job time: 18135 sec 



